numCapture/capture dates discrepancy in HTML v. JSON


1 comment

  • Avatar
    Karl-Rainer Blumenthal

    Thanks for pointing this out, Jack! Here's where and why you can find that missing Wayback capture information:

    The information that you see on collection pages is actually drawn from two different sources: seeds and their descriptions are queried from our Solr index while the Wayback data (the capture numbers and dates) are pulled from a CDX file--the index file for all records in the archival replay interface. So here's how that breaks down in practice:

    Each collection, like the one at, has its own CDX file and API endpoint as well. This "CDX/C" API provides quick access to plain text data about all captures, which we in turn use for instance to plug in the counts and the dates for first and most recent captures. The API is documented here in Archive-It's Help Center and you can see it used in practice to retrieve and represent the same kind of information that you seek in examples like these web archives by Princeton Theological Seminary. PTSEM queries the CDX/C API to produce the "Capture Dates" browsing module for each of its sites cataloged there.

    Hope this helps you and everyone watching from home to retrieve the necessary Wayback info for presentation in your own access layers, but let us know how it works for you.

    Comment actions Permalink

Please sign in to leave a comment.