Instagram Web Crawling
We have a request from a student group to capture their Instagram Account. For a test I tried to capture my personal account with no luck - just a blank page.
I first did it without credentials; then I added my login credentials; then I added ignore robots.txt. Each time nothing more than a blank page.
Does anyone have advice/tricks they use to capture Instagram? Thank - Dan
-
Hi Dan,
Instagram's a pain. We haven't been able to directly crawl it for more than a year. Just isn't possible. For a while, we used manual tools (Webrecorder, Conifer) and then uploaded the WARCs to Archive-It. That worked until something changed at Instagram and our WARCs no longer replayed at Archive-It. Archive-It suggests picuki.com, through which we had a lot of success. Picuki does their own replay of Instagram feeds. We crawled that through Archive-It until recently when Picuki started blocking their crawler. Then I did it manually for a short time, very tedious, but that has become problematic in the last month or so. They have security software now that has a hair trigger and if I move too quickly in a manual crawl, it will identify me as a machine. I managed to get on IP address banned for a while because of that.
So, at the present time, the only way I know to get Instagram feeds is to capture Picuki manually and go very slow. Definitely not scalable but it does work.
Skip
-
Skip Kendall Thanks Skip! I tried Conifer, too and that was worse. Not willing to pay for Webrecorder just now (nor could I justify that as an additional subscription. I could give the Picuki a shot, as the request we have has less than 250 posts. Thanks!
Please sign in to leave a comment.
Comments
2 comments