Instagram Web Crawling

Comments

2 comments

  • Avatar
    Skip Kendall

    Hi Dan,

    Instagram's a pain. We haven't been able to directly crawl it for more than a year. Just isn't possible. For a while, we used manual tools (Webrecorder, Conifer) and then uploaded the WARCs to Archive-It. That worked until something changed at Instagram and our WARCs no longer replayed at Archive-It. Archive-It suggests picuki.com, through which we had a lot of success. Picuki does their own replay of Instagram feeds. We crawled that through Archive-It until recently when Picuki started blocking their crawler. Then I did it manually for a short time, very tedious, but that has become problematic in the last month or so. They have security software now that has a hair trigger and if I move too quickly in a manual crawl, it will identify me as a machine. I managed to get on IP address banned for a while because of that.

    So, at the present time, the only way I know to get Instagram feeds is to capture Picuki manually and go very slow. Definitely not scalable but it does work.

    Skip

    0
    Comment actions Permalink
  • Avatar
    Dan Nooonan

    Skip Kendall Thanks Skip! I tried Conifer, too and that was worse. Not willing to pay for Webrecorder just now (nor could I justify that as an additional subscription. I could give the Picuki a shot, as the request we have has less than 250 posts. Thanks!

    0
    Comment actions Permalink

Please sign in to leave a comment.