Have you ever run a web crawl and collected plenty of good data, only to see a blank (or nearly blank) webpage in Wayback, like this?
You’re not alone. This can be one of the more perplexing results of a web crawl because it offers so little information. However, the good news is that you probably ran your crawl and collected all the data correctly. If and when you see an obstruction like this one, you can call on the Internet Archive’s Web Archivists to help you with a fix here on our side of the WARC-reading Wayback software. A tweak there can unlock the contents of your capture to view in the browser:
Why does this happen? In a nutshell: pages can fail to load completely when Wayback runs into some kind of a hitch while rewriting all of the URLs on that page into their archival forms. Wayback “rewrites” countless links like this in order to enable browsing among the myriad pages in a web archive collection. Where it can run into more trouble is rewriting the new and different kinds of dynamic scripts that underpin many modern websites. Blank or incomplete replay often indicates that Wayback missed one of these elements in its process to automatically rewrite URLs in bulk.
If you think that you see this effect on one of your captures, there are a few easy steps that you can always take on your own, to troubleshoot and confirm that a Wayback software fix is possible:
- Check your crawl report - Confirm that you’ve collected data from the intended seed in order to rule out the possibility that anything prevented you from collecting at crawl time.
- Use Brozzler - Make sure to collect the page in question with the tool most likely to archive its dynamic contents in full.
- Get a second opinion - Have you checked the replay of your archived webpage in the Internet Archive’s Wayback Machine collection? The Wayback Machine already includes upgrades that fix many of these kinds or errors automatically -- coming to Archive-It soon!
The Archive-It team has seen enough variations on this theme that we’ve already developed some handy fixes. If you run into blank spaces like the above, submit a support ticket and we’ll see if your capture can be encouraged to replay more as you expected it to.
Please sign in to leave a comment.