Overview
SoundCloud is an online audio streaming platform that enables users to upload, stream, and share music and podcasts. This guide provides an overview of how to properly format, scope, and crawl SoundCloud seeds. Currently, there are some known issues with archiving SoundCloud which you can read more about below.
Known issues
Social media platforms like SoundsCloud can be difficult to archive. Currently, SoundCloud has the following issues which we continue to actively monitor and work to resolve:
- ⚠️ Most SoundCloud creator pages do not replay in Wayback. Individual tracks and embedded SoundCloud audio tracks can sometimes be replayed through the Wayback banner.
You can find a full list of known issues for archiving various platforms on our Social media and other platforms status page.
On this page:
Scoping SoundCloud seeds
Default scoping for SoundCloud seeds
New SoundCloud seeds added to collections will have the following default scoping rules applied automatically at the seed level; older SoundCloud seeds can be updated by adding the below scoping rules manually or following these instructions.
To learn more, please visit Sites with automated scoping rules.
- Ignore robots.txt blocks on the host: soundcloud.com at the collection level.
- Expand the scope of your crawl to include URLs that contain the following text: ec-media.soundcloud.com and cf-media.sndcdn.com
- Additionally, please crawl all pages with SoundCloud embeds as seeds in order to ensure they capture.
Running your crawl
We recommend that you crawl your seeds using Brozzler.
What to expect from archived SoundCloud seeds
The above directions should best prepare our crawlers to find and archive Soundcloud-hosted material, but it is important to review the results of these crawls for completeness and accuracy. Use Brozzler for full archivability.
Comments
0 comments
Please sign in to leave a comment.