Hi AIT community,
Our web archiving team was recently alerted to an issue that a contributing web admins noticed. She indicated that she prefers to access our captures of her sites through the archive.org wayback machine public user interface (rather than through that of Archive-It). I told her that this should be ok, since everything we capture gets passed to the archive.org wayback machine, where it is also supplemented by additional captures made by others.
However, I had to eat my words last week when she noticed that the crawls are rendering differently across the two platforms. Specifically, the archive.org wayback machine is rendering at least one site without an essential css file.
Not-so-great capture in archive.org: https://web.archive.org/web/20170606201912/https://innovation.gwu.edu/
Lovely capture in archive-it: https://wayback.archive-it.org/5184/20170606201912/https://innovation.gwu.edu/
Here are some of the theories we've entertained:
1. There is a lag in processing, where some files (perhaps those grabbed in a patch crawl) take a while to make it over to archive.org's wayback machine. If so, it would be a lag of at least a month, as the css files are also missing for a May 22 capture of the page above.
2. Patch crawls don't get passed back to archive.org's wayback machine, and the missing css file was captured in a patch crawl.
3. These are not in fact the same capture, and I am incorrect in thinking that our AIT captures flow back into the archive.org wayback machine public portal.
4. The two platforms render captures differently, so it is expected that captures may appear differently across the two platforms.
Or perhaps there is another more elegant, obvious explanation staring us in the face?
Please sign in to leave a comment.