General guidance for archiving social media
Archive-It's software enables partners to capture and display content hosted by popular social media services. These services update and change the ways that they serve this content quite often, so use the general tips and specific service guidance below to avoid running into any problems:
Before you crawl
Be specific with your seed URLs.
In general, when archiving social media, please be very specific with your seed URL; in other words, add only the page that you want to archive as the seed; do NOT use the larger social media site as a seed itself (for example, do NOT use www.facebook.com or www.twitter.com as seeds. DO use: http://twitter.com/internetarchive/).
Double-check your seeds!
Do you need an ending / (slash) ? Please be sure to read below for specific instructions on seeds for each site. Not doing so could result in archiving millions of documents unintentionally.
Run a test crawl
We strongly recommend a on all social media seeds before performing a full production (non-test) crawl. This will ensure that your seeds are configured correctly, and that you won't unintentionally crawl much more content than desired, at the expense of your account's data budget.
Limit your crawls
You may want to set up data and/or document limits for these sites if the test crawl shows an unusually large volume of content, and you have confirmed that your seed URL is correct.
After you crawl
It is especially important to your first social media captures before regularly crawling them as seeds. Please look through your reports and the archived content itself after you run your first crawls in order to ensure that your archived content looks accurate, and that you didn't crawl more than you meant to.
Specific guidance for social media services
Refer to the following guides for help archiving specific kinds of social media content:
- Archiving Facebook pages
- Archiving Flickr streams
- Archiving Instragram feeds
- Archiving Soundcloud pages
- Archiving Tumblr sites
- Archiving Twitter feeds
- Archiving YouTube videos