Overview

Archive-It accounts and archive-it.org provide integrated access to partners' web archive collections and metadata. You may also access collections and their metadata by using the following APIs and custom integrations.

These tools support custom access interfaces, enriching existing archival records with descriptive or technical metadata, organizing and move WARC files, and more. Public collections can be retrieved by the general public. Credentialed Archive-It partners can also retrieve and share private collections or administrative information.

APIs and integrations

Wayback captures

What it is

All documents collected by Archive-It crawls are indexed for archival replay. Access points like Wayback calendar pages use this index to find and link end users to pages in the web archive collection. Anyone can use Archive-It’s CDX/C API to access this index file, find information about captures, and link to them in the collection.

For instance, partners at Princeton Theological Seminary use this API to add access links to archival records like this one and style them in a simplified twirl-down form automatically:

How it works

The calendar information shown above can be retrieved in plain text format like this:

http://wayback.archive-it.org/3813/timemap/cdx?url=http://www.allsaintsprinceton.org/&fl=timestamp,original

For complete documentation of the API, types of capture information available, and how to retrieve it, see: Access Archive-It's Wayback index with the CDX/C API.

Full-text search

Custom search box

What it is

Anyone may connect a search box on their own website to full-text and metadata search results from Archive-It collections or accounts by adding five lines of standard HTML markup to their chosen page.

For instance, partners at the University of Texas at San Antonio Special Collections use this integration to provide site visitors with a search interface, which takes users directly to search results for their keywords or phrase entries on Archive-It’s public-facing website:

university of texas san antonio archived websites searchbox.jpg

How it works

For full documentation, and to create your own custom search box to Archive-It web archives like this one, see: How to add full-text search for web archive collections to your site.

OpenSearch API

What it is

Anyone may use Archive-It's OpenSearch API server in order to develop advanced full-text search access points. This tool retrieves the same full-text search results as the Archive-It web application or public-facing website and delivers them to your desired user-facing access point.

For instance, the partners at the New York Art Resources Consortium (NYARC) use the OpenSearch API to enrich search results in this discovery layer for holdings in all other formats with web archives as well:

How it works

To create an OpenSearch API call like the above and retrieve the raw XML response, format your query as:

http://archive-it.org/search-master/opensearch?q=[search query]&i=[collection number/s]

For full documentation and instructions to use and customize these search results, see: Access your web archives with OpenSearch.

Web ARChive (WARC) files

What it is

All Archive-It web archives are stored in the internationally standard WARC file format (ISO 28500-2009) and all credentialed Archive-It partners may download their files and associated technical metadata for their own local preservation, management, and analysis.

How it works

A typical request for all files from a specific collection can be made in a web browser like this:

https://warcs.archive-it.org/wasapi/v1/webdata?collection=[Collection #]

Requests return download links for WARC files along with their associated technical metadata in JSON, CSV, or XML format:

For full documentation and demonstration of the options to requests specific files from a web browser or command line interface, see: Find and download your WARC files with WASAPI.

Descriptive metadata

What it is

Archive-It partners may share the descriptive metadata about their collections in XML format, for harvesting by repositories, catalogs, and other tools that use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

For example, the descriptive metadata for the Internet Archive’s “Jasmine Revolution - Tunisia 2011” collection is retrieved directly by OCLC’s Worldcat union catalog from:

https://archive-it.org/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=http://archive-it.org/collections/2323

Screen_Shot_2018-07-24_at_3.32.55_PM.png

How it works

For full instruction on exposing collections to OAI-PMH harvesting and custom record harvesting parameters, see: Access web archives with the OAI-PMH metadata feed.

Partner data

What it is

The Archive-It Partner API provides information about partners’ accounts, collections, crawls, metadata, and more to the Archive-It web application. Credentialed Archive-It partners may also access this data directly in order to create their own custom front ends, share descriptive or administrative metadata or provenance information, or to develop further integrations.

For instance, partners at the University at Albany SUNY retrieve technical metadata about crawls like their unique ID numbers, durations, and scoping rules to include in web archive finding aids:

How it works

The Archive-It Partner API endpoint is accessible at: https://partner.archive-it.org/api/

For a summary of the types of data available through this API and instructions to retrieve them, see: Access your account with the Archive-It Partner API.

Browser icons created by Linseed Studio from the Noun Project

Comments

1 comment

Henry Zhang May 23, 2025 16:34 (Edited May 24, 2025 21:36)

I am wondering what is the API endpoint for uploading external warc files into a designated collection. The application scenario is below:

There is a folder containing a list of warc files which were captured outside archive-it. We would like to upload these with automation instead of through the partner.archive-it.org's web browser interface. This also requires automatically create a seed for each individual warc as the entry point for the Wayback machine.

0

Comment actions Permalink

Please sign in to leave a comment.

Articles in this section

About Archive-It APIs and access integrations

Overview

APIs and integrations

On this page:

Wayback captures

What it is

How it works

Full-text search

Custom search box

What it is

How it works

OpenSearch API

What it is

How it works

Web ARChive (WARC) files

What it is

How it works

Descriptive metadata

What it is

How it works

Partner data

What it is

How it works

Comments

Articles in this section

Overview

APIs and integrations

On this page:

Wayback captures

What it is

How it works

Full-text search

Custom search box

What it is

How it works

OpenSearch API

What it is

How it works

Web ARChive (WARC) files

What it is

How it works

Descriptive metadata

What it is

How it works

Partner data

What it is

How it works

Related articles