Overview
Archive-It accounts and archive-it.org provide integrated access to partners' web archive collections and metadata. You may also access collections and their metadata by using the following APIs and custom integrations.
These tools support custom access interfaces, enriching existing archival records with descriptive or technical metadata, organizing and move WARC files, and more. Public collections can be retrieved by the general public. Credentialed Archive-It partners can also retrieve and share private collections or administrative information.
APIs and integrations
On this page:
Find information, examples, and documentation for each of the data below:
Wayback captures
What it is
All documents collected by Archive-It crawls are indexed for archival replay. Access points like Wayback calendar pages use this index to find and link end users to pages in the web archive collection. Anyone can use Archive-It’s CDX/C API to access this index file, find information about captures, and link to them in the collection.
For instance, partners at Princeton Theological Seminary use this API to add access links to archival records like this one and style them in a simplified twirl-down form automatically:
How it works
The calendar information shown above can be retrieved in plain text format like this:
For complete documentation of the API, types of capture information available, and how to retrieve it, see: Access Archive-It's Wayback index with the CDX/C API.
Full-text search
Custom search box
What it is
Anyone may connect a search box on their own website to full-text and metadata search results from Archive-It collections or accounts by adding five lines of standard HTML markup to their chosen page.
For instance, partners at the University of Texas at San Antonio Special Collections use this integration to provide site visitors with a search interface, which takes users directly to search results for their keywords or phrase entries on Archive-It’s public-facing website:
How it works
For full documentation, and to create your own custom search box to Archive-It web archives like this one, see: How to add full-text search for web archive collections to your site.
OpenSearch API
What it is
Anyone may use Archive-It's OpenSearch API server in order to develop advanced full-text search access points. This tool retrieves the same full-text search results as the Archive-It web application or public-facing website and delivers them to your desired user-facing access point.
For instance, the partners at the New York Art Resources Consortium (NYARC) use the OpenSearch API to enrich search results in this discovery layer for holdings in all other formats with web archives as well:
How it works
To create an OpenSearch API call like the above and retrieve the raw XML response, format your query as:
http://archive-it.org/search-master/opensearch?q=[search query]&i=[collection number/s]
For full documentation and instructions to use and customize these search results, see: Access your web archives with OpenSearch.
Web ARChive (WARC) files
What it is
All Archive-It web archives are stored in the internationally standard WARC file format (ISO 28500-2009) and all credentialed Archive-It partners may download their files and associated technical metadata for their own local preservation, management, and analysis.
How it works
A typical request for all files from a specific collection can be made in a web browser like this:
https://warcs.archive-it.org/wasapi/v1/webdata?collection=[Collection #]
Requests return download links for WARC files along with their associated technical metadata in JSON, CSV, or XML format:
For full documentation and demonstration of the options to requests specific files from a web browser or command line interface, see: Find and download your WARC files with WASAPI.
Descriptive metadata
What it is
Archive-It partners may share the descriptive metadata about their collections in XML format, for harvesting by repositories, catalogs, and other tools that use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
For example, the descriptive metadata for the Internet Archive’s “Jasmine Revolution - Tunisia 2011” collection is retrieved directly by OCLC’s Worldcat union catalog from:
How it works
For full instruction on exposing collections to OAI-PMH harvesting and custom record harvesting parameters, see: Access web archives with the OAI-PMH metadata feed.
Partner data
What it is
The Archive-It Partner API provides information about partners’ accounts, collections, crawls, metadata, and more to the Archive-It web application. Credentialed Archive-It partners may also access this data directly in order to create their own custom front ends, share descriptive or administrative metadata or provenance information, or to develop further integrations.
For instance, partners at the University at Albany SUNY retrieve technical metadata about crawls like their unique ID numbers, durations, and scoping rules to include in web archive finding aids:
How it works
The Archive-It Partner API endpoint is accessible at: https://partner.archive-it.org/api/
For a summary of the types of data available through this API and instructions to retrieve them, see: Access your account with the Archive-It Partner API.
Browser icons created by Linseed Studio from the Noun Project
Comments
0 comments
Please sign in to leave a comment.