Overview
Instagram is a photo and video-sharing application and social networking service. This guide provides an overview of how to properly format, scope, and crawl Instagram seeds.
Known issues
Social media platforms like Instagram can be challenging to archive. Currently, Instagram has the following issues that we continue to actively monitor:
-
⚠️ Wayback replay of recent Instagram pages resolve to a blank logo page. For captures prior to ~May 2025:
- Instagram is blocking collection and Wayback replay beyond the 12-post default load page for most organizational and personal profile pages. Right-click to open individual posts in a new tab.
- If a page appears blank, wait 10-20 seconds to load the page. Click outside blank prompts to dismiss them. Replay media through the Wayback banner.
- ⚠️ If you are redirected to a 429 error or login page, content has not been collected and cannot be replayed. Try running a new crawl.
For a full list of known issues for archiving various platforms, see Status of monitored platforms.
On this page:
- How to select and format your Instagram seeds
- Scoping Instagram seeds
- Running your crawl
- What to expect from archived Instagram seeds
How to select and format your Instagram seeds
- Be specific. Always include a specific user, followed by a / at the end. For example https://www.instagram.com/internetarchive/
- Use the Standard seed type for Instagram seeds.
Scoping Instagram seeds
Default scoping for Instagram seeds
New Instagram seeds will have the default scoping rules automatically applied at the seed level when they are added to a collection. To learn more, including how you can add default scoping rules to existing seeds, visit Sites with automated scoping rules.
- At the seed level, add a ignore robots.txt scoping rule. Note: Ignore Robots.txt is automatically to all new Instagram seeds.
-OR-
- At the collection level, add a scoping rule to ignore robots.txt for the hosts www.instagram.com and fbcdn.net.
Running your crawl
Once you have finished selecting your seeds and adding recommended scoping rules, we highly recommend that you crawl your seeds using Brozzler.
What to expect from your archived Instagram seeds
Wayback replay of recent Instagram pages resolve to a blank logo page. Prior to ~May 2025:
- When collected Wayback captures replay the default load (up to 12 posts) for most organizational and personal profile pages. Right-click to open individual posts in a new tab.
- If a page appears blank, wait for 10-20 seconds to load the page. Click outside any blank prompts to dismiss them.
To playback media, use the Wayback banner's media link.
For posts, reels, and tagged feeds on organizational and personal profile pages, if you are redirected to a 429 error or login page, content has not been collected and cannot be replayed. Try running a new crawl.
Comments
0 comments
Please sign in to leave a comment.