Social media platforms update frequently. For current information on any known issues archiving SoundCloud content, please see our Status of monitored platforms page.
New SoundCloud seeds will have the default scoping rules automatically applied at the seed level when they are added to a collection. To learn more, including how you can add default scoping rules to existing seeds, please visit Sites with automated scoping rules.
- Ignore robots.txt blocks on the host: soundcloud.com at the collection level.
- Expand the scope of your crawl to include URLs that contain the following text: ec-media.soundcloud.com and cf-media.sndcdn.com
- Additionally, please crawl all pages with SoundCloud embeds as seeds in order to ensure they capture.
- Crawl using Brozzler
What to expect from your archived SoundCloud seeds:
The above directions should best prepare our crawlers to find and archive Soundcloud-hosted material, but it is important to review the results of these crawls for completeness and accuracy. Use Brozzler for full archivability.
Comments
0 comments
Please sign in to leave a comment.