New SoundCloud seeds will have the default scoping rules automatically applied at the seed level when they are added to a collection. To learn more, including how you can add default scoping rules to existing seeds, please visit Sites with automated scoping rules.
- Ignore robots.txt blocks on the host: soundcloud.com at the collection level.
- Expand the scope of your crawl to include URLs that contain the following text: ec-media.soundcloud.com and cf-media.sndcdn.com
- Additionally, please crawl all pages with SoundCloud embeds as seeds in order to ensure they capture.
What to expect from your archived SoundCloud seeds:
The above directions should best prepare our crawlers to find and archive Soundcloud-hosted material, but it is important to review the results of these crawls for completeness and accuracy. Use Brozzler for full archivability.