Instagram is a photo and video-sharing application and social networking service. This guide provides an overview of how to properly format, scope, and crawl Instagram seeds. Currently, there are some known issues with archiving Instagram which you can read more about below.
Social media platforms like Instagram can be difficult to archive. Currently, Instagram has the following issues which we continue to actively monitor:
- ⚠️ Captures made since September 2020 may fail to replay completely in Wayback mode. A resolution to this issue is in progress.
- ❌ Instagram is blocking more recent captures of most organizational and personal profile pages. We are working actively towards a solution.
You can find a full list of known issues for archiving various platforms on our Status of monitored platforms page.
On this page:
- How to select and format your Instagram seeds
- Scoping Instagram seeds
- Running your crawl
- What to expect from archived Instagram seeds
How to select and format your Instagram seeds
Be specific. Always include a specific user, followed by a / at the end. For example https://www.instagram.com/internetarchive/
- Use the Standard seed type for Instagram seeds
Scoping Instagram seeds
Default scoping for Instagram seeds
New Instagram seeds will have the default scoping rules automatically applied at the seed level when they are added to a collection. To learn more, including how you can add default scoping rules to existing seeds, please visit Sites with automated scoping rules.
- Ignore robots.txt at the seed level -OR- Add a collection level scoping rule to ignore robots.txt for the hosts www.instagram.com and fbcdn.net. Ignore Robots.txt will be added automatically to all new Instagram seeds.
Running your crawl
Once you have finished selecting your seeds and adding recommended scoping rules, we highly recommend that you crawl your seeds using Brozzler.
What to expect from your archived Instagram seeds
Captures made by Standard Archive-It crawling technology replay the default load (up to 12 images) of content on Instagram feeds. Use Brozzler to create captures that scroll beyond the default load in Wayback replay.
While we work on addressing the known issues with archiving Instagram, we have identified a possible workaround. We have seen initial success collecting and replaying Instagram via the following third-party viewer and analyzer platforms:
If you would like to try this option, you can:
- Search for your Instagram account on either platform
Add the URL as a seed. We recommend the following:
- Using the Standard seed type
- Adding a seed-level scoping rule to ignore robots.txt files
- Crawling with Brozzler.