Overview
Instagram is a photo and video-sharing application and social networking service. This guide provides an overview of how to properly format, scope, and crawl Instagram seeds.
Known issues
Social media platforms like Instagram can be difficult to archive. Currently, Instagram has the following issues that we continue to actively monitor:
- ⚠️ Recent Instagram organizational and personal profile pages may redirect to a login page and error. If you see a "this page isn't working" error, content has not been collected and cannot be replayed.
- ⚠️ Instagram is blocking collection and Wayback replay beyond the 12-post default load page for most organizational and personal profile pages. It may take 10-20 seconds for a page to load. Right-click to open individual posts in a new tab. Replay media through the Wayback banner.
For a full list of known issues for archiving various platforms, see Status of monitored platforms
On this page:
- How to select and format your Instagram seeds
- Scoping Instagram seeds
- Running your crawl
- What to expect from archived Instagram seeds
How to select and format your Instagram seeds
- Be specific. Always include a specific user, followed by a / at the end. For example https://www.instagram.com/internetarchive/
- Use the Standard seed type for Instagram seeds.
Scoping Instagram seeds
Default scoping for Instagram seeds
New Instagram seeds will have the default scoping rules automatically applied at the seed level when they are added to a collection. To learn more, including how you can add default scoping rules to existing seeds, visit Sites with automated scoping rules.
- At the seed level, add a ignore robots.txt scoping rule. Note: Ignore Robots.txt is automatically to all new Instagram seeds.
-OR-
- At the collection level, add a scoping rule to ignore robots.txt for the hosts www.instagram.com and fbcdn.net.
Running your crawl
Once you have finished selecting your seeds and adding recommended scoping rules, we highly recommend that you crawl your seeds using Brozzler.
What to expect from your archived Instagram seeds
For posts, reels, and tagged feeds on organizational and personal profile pages, Instagram may redirect to a login page with a "this page isn't working" error. If this occurs, content has not been collected and cannot be replayed.
When collected, Wayback captures replay the default load (up to 12 images) of content. If a page initially appears blank, wait for 10-20 seconds for the page to render and cancel out of any login prompts. To view an individual post in a feed, right-click to open it in a new tab.
To playback videos and media, use the Wayback banner's media link.
Comments
0 comments
Please sign in to leave a comment.