Overview
You can make your public collections' web archives more discoverable by making their metadata available to the Open Access Initiative – Protocol for Metadata Harvesting (OAI-PMH). External catalogs like WorldCat use OAI-PMH to harvest metadata from around the web. This article provides instructions for enabling OAI-PMH feeds in your collections.
Prerequisites
- Public collections with added descriptive metadata applied.
- Firefox or Chrome browser to display the XML feeds.
- A grasp of XML would be handy.
On this page:
About OAI-PMH
The OAI-PMH server operates like this:
- Receive HTTP requests in OAI-PMH protocol verbage
- Convert these requests to local SOLR queries
- SOLR returns relevant metadata per the request
- The servlet / wrapper converts these into the requested XML format via XSLT stylesheets
- These results are returned to the user as XML as a typical OAI-PMH result
A suite of XSLT stylesheets have been created to respond to the 6 OAI-PMH verbs:
- Identify
- ListMetadataFormats
- ListSets
- ListIdentifiers
- ListRecords
- GetRecord
Opt in to OAI-PMH
To ensure privacy, Archive-It collections are not included in the OAI-PMH feed by default. To opt-in:
- Go to the collection you want to add.
- Select the Metadata tab.
- For OAI-PMH Export, click the check box for Export metadata to OAI-PMH (including WorldCat)
In approximately 20 minutes, you will be able to access your collection's metadata in the OAI-PMH feed after Archive-It's SOLR index is updated.
Access the OAI-PMH feed
Request the feed
To request the feed, you will need to use a combination of Archive-It's endpoints and "verbs" in your browser's address bar. The feed will be displayed in XML format.
Endpoints
The endpoint for Archive-It's OAI-PMH server is: https://www.archive-it.org/oai
This can be used for general information.
For more specific information about your collection's feeds, you will need to add your account number to this endpoint https://www.archive-it.org/oai/organizations/[Organization Number]
- Replace [Organization Number] above with your account's number.
- You can find your number in the browser's address bar when in your account following the prefix https://partner.archive-it.org/.
- Paste this endpoint with your account number in a new tab's address bar.
Verbs
Follow endpoints with "verbs" to make requests for feeds. There are 6 "verbs" you can combine for your requests:
?verb=Identify
- Returns basic information about the OAI-PMH repository.
- For example: https://archive-it.org/oai?verb=Identify
?verb=ListMetadataFormats
- Returns a list of all metadata formats available.
- For example: https://archive-it.org/oai?verb=ListMetadataFormats
?verb=ListIdentifiers&metadataPrefix=[Metadata_Prefix]
- Returns a list of all record identifiers with date of last modification.
- Archive-It metadata is usually in Dublin core
- &metadataPrefix=oai_dc
?verb=ListRecords&metadataPrefix=[Metadata_Prefix]
- Returns all metadata records for the given prefix.
?verb=GetRecord&metadataPrefix=[Metadata_Prefix]&identifier=[Identifier]
- Returns the metadata record the [Identifier] points to with the XML returned in the designated [Metadata_Prefix].
?verb=ListSets
- Returns list of available Sets.
- Sets are constructed at the organization level.
- For example: https://archive-it.org/oai?verb=ListSets
- Sets can be the organization's collections available in OAI-PMH feeds as well.
Pagination
XML feeds requested display the first 100 records by default. At the end of the 100 records you can see a <resumptionToken>. You can adjust this to advance to the next 100 records or your preferred number of records.
In the above example, the <resumptionToken> appears at the end of the XML feed for this request: https://archive-it.org/oai/organizations/1036?verb=ListRecords&metadataPrefix=oai_dc
It tells us that this XML feed:
- Begins at the very start to our list of records (cursor="0").
- There are 447 total records in the feed (completeListSize="447").
- This page shows its first 100 records.
We can use this resumption token's information in a request to advance the feed's display to the next 100 records:
Or we can use it to display all records in the feed with the following request:
Metadata formats and schema
Archive-It seeds and collections use the Dublin Core standard as the default metadata schema. This is the same baseline standard for the OAI-PMH protocol.
The OAI-PMH server currently also converts the original Dublin Core to export MARC records. We will update this guidance to include further abilities to export in more formats, such as MODS, EAD, and RDF.
Metadata formats for OAI-PMH can be returned from the URL: https://archive-it.org/oai?verb=ListMetadataFormats
For example:
<metadataPrefix>oai_marc21</metadataPrefix>
- This section of the results returns the prefix, in this case "oai_marc21" which is the MARC21 format.
<schema>http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd</schema>
- This section returns the URL for the metadata schema's definition and dictionary.
Collection and seed identifiers
The collection <identifier> metadata field will be a link to your collections page at archive-it.org.
- For example: https://www.archive-it.org/collections/649
The seed <identifier> metadata field will be a link to the seed's calendar page.
Find your content in external catalogs
Content can be found using the Advanced Search on Worldcat.org:
- Use the format “archival material”
- Use the facet “downloadable archival material”
OAI-PMH feeds are at the collection level, so search with either collection titles or keywords. For example, searching the collection title "Jasmine Revolution - Tunisia 2011" returns this record: https://search.worldcat.org/title/1446894960
When the record is opened, clicking the Access Free button or View online link will direct you to the Archive-It collection.
Note: If the feed is for a private collection, the link will still direct to the collection page on Archive-It.org. But the collection content will not be available in the external catalog.
OCLC's WorldCat catalog harvests the Archive-It OAI-PMH feed on a monthly basis.
Related content
Open Archives Initiative–Protocol for Metadata Harvesting
Comments
0 comments
Please sign in to leave a comment.