New SoundCloud seeds will have the default scoping rules automatically applied at the seed level when they are added to a collection. To learn more, including how you can add default scoping rules to existing seeds, please visit Sites with automated scoping rules.
- Ignore robots.txt blocks on the host: soundcloud.com at the collection level.
- Expand the scope of your crawl to include URLs that contain the following text: ec-media.soundcloud.com and cf-media.sndcdn.com
- Additionally, please crawl all pages with SoundCloud embeds as seeds in order to ensure they capture.
The above directions should best prepare our crawlers to find and archive Soundcloud-hosted material, but it is important to review the results of these crawls for completeness and accuracy.
Comments
0 comments
Please sign in to leave a comment.