Test crawls URLs not showing up in Wayback
I'm running a bunch of test crawls and I'm at the stage where I'm looking at my crawls reports and going through the various seeds to see if I'm capturing what I expect.
I'm now at the stage where I've waiting 24 hours and gone through the Seeds tab of your test crawl report but I'm still getting an error message that it is not available.
Here is one example https://wayback.archive-it.org/10614/*/https://twitter.com/slavresistance/status/1016697918970105857/ that is still live on the web and it shows it was crawled twice (July 13 and the 20th)
Another example is https://wayback.archive-it.org/10614/*/https://twitter.com/MosesSumney/status/1014235370521673728/
I'm also wondering if I should have scoped this crawl as One Page since all I'm trying to capture in a single tweet but with an embedded image.
-
Hi, Sarah! Apologies that this wasn't clearer as you reviewed your crawls, but the good news is that your test captures are indeed available to view in Wayback here:
Just note that -test text next to the collection number in the URLs above that does not appear in the links that you are following in your post. These -test style of links should be in your test crawl's Seeds report, so do by all means let us know via the support channel if you see anything to the contrary and we'll check it out. Wayback URLs like the ones that you provided appear in your overarching collection's Seeds tab and will work only if and when you elect to permanently save the contents of your tests.
In this case, I think that you will find little difference between crawling your Twitter post seed as a "standard" or "one page" seed because the way that the seed is formatted, with a trailing slash after the post ID number, effectively constricts the scope of the crawl to that one post alone. Let us know if that raises any new questions about formatting or scoping strategy though, of course!
Please sign in to leave a comment.
Comments
1 comment