Your web archives' collection- and seed-level metadata may be made available through an OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting) metadata feed upon your request. Enabling the metadata to populate these fields can boost your archives' accessibility through external catalogs that use OAI-PMH to harvest collections metadata from around the web.
On this page:
- How to "opt-in" a collection to the OAI-PMH metadata feed
- How to access the OAI-PMH metadata feed
- Further information
OAI-PMH, the "Open Archive Initiative – Protocol for Metadata Harvesting," support interoperability between data providers, who maintain one or more repositories (web servers) of metadata, and service providers, who issue requests to those data providers and use their metadata as the basis for building value-added services. One example of this relationship in practice is the syncing of metadata with OCLC's WorldCat catalog, home to billions of records from library repositories around the world. Researchers and the general public turn to this catalog in order to find both print and online materials, including archived web collections.
The feed is publicly available and can therefore be used by individuals or institutions alike to search, discover, aggregate, and provide links to collections publicly available through Archive-It.
How it works
The OAI-PMH server operates roughly as follows:
- Receive HTTP requests in OAI-PMH protocol verbage
- Convert these requests to local SOLR queries
- SOLR returns relevant metadata per the request
- The servlet / wrapper converts these into the requested XML format via XSLT stylesheets
- These results would be returned to the user as XML as a typical OAI-PMH result
A suite of XSLT stylesheets have been created to respond to the 6 OAI-PMH verbs:
How to "opt-in" a collection to the OAI-PMH metadata feed
You may add to the Archive-It OAI-PMH metadata feed at the collection level and on an "opt-in" basis (to ensure that ample privacy is accommodated, collections' metadata are not included in the feed by default). To do so:
- Navigate to the given collection's management interface
- Click into the collection's "Metadata" tab
- In the field titled OAI-PMH Export, check the box marked "Export metadata to OAI-PMH (including WorldCat)"
As soon as Archive-It's regularly refreshed SOLR search index is updated, you will be able to access your collection's metadata in the OAI-PMH feed (see instructions below). Access through external systems using the protocol will depend upon those service providers' independent harvesting schedules.
How to access the OAI-PMH metadata feed
Endpoints and verbs
The base URL, or "endpoint," for Archive-It's collection-level OAI-PMH server is:
The endpoint for seed-level OAI-PMH server is:
(Replace "Organization Number" above with your account's unique ID number in order to access your seeds' metadata).
The endpoints above are not generally designed for human consumption, but you (or an OAI-PMH harvester) can issue the following six types of requests. All responses are in XML, so users of the Safari web browser will need to "View Source" in order to see them (Chrome and Firefox display plain XML in outline format):
- Returns basic information about the OAI-PMH repository.
- For example: http://archive-it.org/oai?verb=Identify
- Returns a list of all metadata formats available (with the Metadata Prefix used in other queries).
- For example: http://archive-it.org/oai?verb=ListMetadataFormats
- Returns a list of all record identifiers with date of last modification.
- For example: http://archive-it.org/oai?verb=ListIdentifiers&metadataPrefix=oai_dc
- Returns all of the metadata records for the given prefix.
- For example: http://archive-it.org/oai?verb=ListRecords&metadataPrefix=oai_dc
- Returns the metadata record the [Identifier] points to, with the XML returned in the designated [Metadata_Prefix].
- For example: http://archive-it.org/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=http://archive-it.org/collections/2323
- Returns list of available Sets. Currently, sets are automatically constructed at the partner level.
- For example: http://archive-it.org/oai?verb=ListSets
Additionally, sets can be used to return records for a given partner, organization, or other established "set" of collections. For example, to return the records belonging to the set "Internet Archive Global Events," which has the Set name "organization:89" (as provided by the ListSets verb above), use:
Important note about pagination
By default, replies to the above kinds of requests to Archive-It's OAI-PMH server will include the first one hundred (100) records in XML format, concluding with a "resumption token" that you may use to advance your view to subsequent pages with your preferred count of records.
In the above example, for instance, the <resumptionToken> element appears at the end of the XML manifest for the following request: https://archive-it.org/oai/organizations/1036?verb=ListRecords&metadataPrefix=oai_dc
It tells us that our page of XML begins at the very start to our list of records (cursor="0"), that there are 447 total such records in the manifest (completeListSize="447"), and that this page includes its first 100 records. We can use this resumption token to either advance our cursor or simply display all records by applying the following syntax:
[Organization #]?verb=ListRecords&[metadata prefix]&resumptionToken=[Starting record #],[# records to return]
A manifest that begins with the record immediately following the last record in the default reply from our above example would, for instance, look like: https://archive-it.org/oai/organizations/1036?verb=ListRecords&metadataPrefix=oai_dc&resumptionToken=101,100
And a manifest that includes all records: https://archive-it.org/oai/organizations/1036?verb=ListRecords&metadataPrefix=oai_dc&resumptionToken=0,447
Metadata formats and schema
One powerful advantage of OAI-PMH servers is their ability to return records in a desired metadata formats/schema. Archive-It seeds and collections use the Dublin Core standard as the default metadata schema, which is conveniently the same baseline standard for the OAI-PMH protocol. Institutions like the Library of Congress have, however, created XSLT stylesheets that will "crosswalk" from one metadata format to another. Similarly, the OAI-PMH server currently also converts the original Dublin Core in order to export MARC records. We will update this guidance to include further abilities to export in more formats, such as MODS, EAD, and RDF.
Available metadata formats, as mentioned above, can at any time be returned from the URL: http://archive-it.org/oai?verb=ListMetadataFormats
An example will return these useful values for each currently offered metadata format:
- This section of the results returns the prefix used in other OAI-PMH requests, in this case "oai_marc21" which is the MARC21 format.
- This section returns the URL for the metadata schema's definition and dictionary.
Collection and seed identifiersTo facilitate access to your web archives through external catalogs, Archive-It automatically populates the collection "identifier" metadata field with a link to your collections page at archive-it.org. For example, the Library of Virginia's collection on the tragic shooting at Virginia Tech appears in the feed with the identifier: http://www.archive-it.org/collections/649.
Content can be searched on Worldcat.org as the format “archival material” in ‘Advanced Search’ or as “downloadable archival material” as a search result facet. Because the OAI-PMH feed is at the collection level, either collection titles (not the collection name) or keywords should be searched. Once inside the record, use "Free Access" or "View Online" to access the content. Please note that if the feed is enabled for a private collection, the link will still direct to the collection page on Archive-It.org, however the content will not be available.