One date for the site capture but different capture dates as you drill through "pages"



  • Avatar
    Karl-Rainer Blumenthal

    Hi, Rachel! In this case you have links off of the archived landing page for this seed that were not captured when it was originally crawled in May 2010, but that were captured during subsequent crawls, for instance starting in December 2011. When you follow a link in Wayback from one archived page to another, it defaults to showing you the most temporally proximal capture of the latter. This is especially important for replay because even a single crawl can run for multiple days and therefore link pages that were archived on different dates. In either case this is preferable to the “Not in Archive” message when that latter capture exists somewhere in the archive because it would be misleading to say that the page wasn’t archived—it was, but just not necessarily on the same date as the source page.

    Depending upon your researcher, their level of familiarity with web archives, the kinds of analysis that they intend to perform, etc., there should still be a way for them to look at only the captures that were made up to a specified date/time in your archive. Either here or directly, is there anything more you can tell me specifically about their intent?

  • Avatar
    Rachel Taketa

    Thanks Karl!  I appreciate you clearing up the dates in the preserved website and how it affects replay.  I agree, filling in the page with one from a close date is preferable to nothing and I think the only time that poses a problem for our researchers is when they are coding websites for a specific study and need to be able to say without certainty that perhaps an ad campaign had this slant within this period of time or something like "on a specific date, this particular e-cig advocacy site claimed that the side stream smoke was only water vapor"  and when they link to that particular seed in their paper, they don't want jumps in dates within the preserved site since that may make their statement look less credible.  I don't think that's a problem so much with dates that are only off by a few days or a month but they did look at one site where part of the site was from a different year (my example above) and that was disconcerting. I have noticed that this filling in of pages happens less frequently as I look at the more current captures and I think that might be due to better crawling?  Anyway, mostly I just wanted to get a good clear explanation as to why this occurs so I could relay it back to the researchers and they now know to look for date changes in the banner when they click through.  

    Thanks again Karl!

Please sign in to leave a comment.