Twitter Hashtag Crawls not producing results
Hello everyone,
I am trying to crawl some Twitter hashtags related to our university but I'm not able to get it to work regardless of what I try.
More often than not the crawl produces a wayback instance with "no results found" but if one searches the hashtag on twitter there are multiple results.
I have experimented with both regular crawls and Brozzler. Brozzler seems to have better results in that if there is a twitter profile with the same name as the hashtag, it will grab that as a result but still does not pull individual tweets.
I'm logged in to twitter through the dummy account we use for web archiving and I've got everything scoped the way all the advice says to.
What technique do you use to successfully crawl hashtags in a way that produces a reliable result in the wayback instance?
-
This probably doesn't help but I usually use twarc for my twitter harvests. Now that I think about it, webrecorder would probably work well too. Dunno how to get it from there into wayback though.
-
Hi all. Sorry for the delay on this one. We're untangling a few inter-related issues with recent Twitter captures from here on our end. I believe that the issue causing most of your trouble is on our Wayback/replay side to fix, so I do not yet advise changing your crawling strategies yet. One important exception to this: we are not yet capable of logging into Twitter. In fact attempting to do so has been known to cause the error message that you see, Elizabeth. So I would just advise to remove any login credentials from the seeds that you archive regularly.
Apologies again for the lack of a fix already, but we're on the case and eager to see this one resolved for everyone! Stay tuned and I'll provide the update here when there is new information and hopefully a fix to review.
Please sign in to leave a comment.
Comments
5 comments