Overview
The Archive-It web application and public-facing website provide integrated access to partners’ web archive collections and their metadata. Partners may also access this information about collections and their contents by using the following APIs and custom integrations. These tools can be used to develop custom access interfaces, enrich existing archival records with descriptive or technical metadata, organize and move WARC files, and more. Publicly accessible material may be retrieved by the general public and further APIs exist for administrative information to be retrieved and shared by credentialed Archive-It partners.
Find information, examples, and documentation for each of these tools below:
APIs and integrations
Wayback captures
All documents collected by Archive-It crawls are indexed for archival replay. Access points like Wayback calendar pages use this index to find and link end users to pages in the web archive collection. Anyone can use Archive-It’s CDX/C API to access this index file, find information about captures, and link to them in the collection.
For instance, partners at Princeton Theological Seminary use this API to add access links to archival records like this one and style them in a simplified twirl-down form automatically:
The calendar information shown above can be retrieved in plain text format like this:
For complete documentation of the API, types of capture information available, and how to retrieve it, see: Access Archive-It's Wayback index with the CDX/C API.
Full-text search
Custom search box
Anyone may connect a search box on their own website to full-text and metadata search results from Archive-It collections or accounts by adding five lines of standard HTML markup to their chosen page.
For instance, partners at the DC Public Library use this integration in order to provide site visitors with this search interface, which takes users directly to search results for their keywords or phrase entries on Archive-It’s public-facing website:
For full documentation, and to create your own custom search box to Archive-It web archives like this one, see: Provide access to your web archives from your own domain.
OpenSearch API
Anyone may use Archive-It’s OpenSearch API server in order to develop advanced full-text search access points. This tool retrieves the same full-text search results as the Archive-It web application or public-facing website and delivers them to your desired user-facing access point.
For instance, the partners at the New York Art Resources Consortium (NYARC) use the OpenSearch API to enrich search results in this discovery layer for holdings in all other formats with web archives as well:
To create an OpenSearch API call like the above and retrieve the raw XML response, format your query as:
http://archive-it.org/search-master/opensearch?q=[search query]&i=[collection number/s]
For full documentation and instructions to use and customize these search results, see: Access your web archives with OpenSearch.
WARC files
All Archive-It web archives are stored in the internationally standard WARC file format (ISO 28500-2009) and all credentialed Archive-It partners may download their files and associated technical metadata for their own local preservation, management, and analysis.
In a web browser for instance, a typical request for all files from a specific collection can be made like this:
https://warcs.archive-it.org/wasapi/v1/webdata?collection=[Collection #]
Requests return download links for WARC files along with their associated technical metadata in JSON, CSV, or XML format:
For full documentation and demonstration of the options to requests specific files from a web browser or command line interface, see: Find and download your WARC files with WASAPI.
Descriptive metadata
Archive-It partners may share the descriptive metadata about their collections in XML format, for harvesting by repositories, catalogs, and other tools that use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
For example, the descriptive metadata for the Internet Archive’s “Jasmine Revolution - Tunisia 2011” collection is retrieved directly by OCLC’s Worldcat union catalog from:
For full instruction on exposing collections to OAI-PMH harvesting and custom record harvesting parameters, see: Access web archives with the OAI-PMH metadata feed.
Partner data
The Archive-It Partner API provides information about partners’ accounts, collections, crawls, metadata, and more to the Archive-It web application. Credentialed Archive-It partners may also access this data directly in order to create their own custom front ends, share descriptive or administrative metadata or provenance information, or to develop further integrations.
For instance, partners at the University at Albany SUNY retrieve technical metadata about crawls like their unique ID numbers, durations, and scoping rules to include in web archive finding aids:
Any Archive-It partner may find the same and much more information through the Partner API at:
https://partner.archive-it.org/api/
For a summary of the types of data available through this API and how to retrieve them, see: Access your account with the Archive-It Partner API
Browser icons created by Linseed Studio from the Noun Project
Comments
0 comments
Please sign in to leave a comment.