Overview
You can make your public collections' web archives more discoverable by making their metadata available to the Open Access Initiative – Protocol for Metadata Harvesting (OAI-PMH). External catalogs like WorldCat use OAI-PMH to harvest metadata from around the web. This article will provide instructions for enabling OAI-PMH feeds in your collections.
Prerequisites
- Public collections with added descriptive metadata applied.
- Firefox or Chrome browser to display the XML feeds.
- A grasp of XML would be handy.
On this page:
About OAI-PMH
The OAI-PMH server operates like this:
- Receive HTTP requests in OAI-PMH protocol verbage
- Convert these requests to local SOLR queries
- SOLR returns relevant metadata per the request
- The servlet / wrapper converts these into the requested XML format via XSLT stylesheets
- These results would be returned to the user as XML as a typical OAI-PMH result
A suite of XSLT stylesheets have been created to respond to the 6 OAI-PMH verbs:
- Identify
- ListMetadataFormats
- ListSets
- ListIdentifiers
- ListRecords
- GetRecord
Opt in to OAI-PMH
Archive-It collections are not included in the OAI-PMH feed by default. This ensures privacy. If you would like to opt-in, start in the collection you want to add:
- Click the Metadata tab.
- Look for the field for OAI-PMH Export half way down the screen.
- Click the check box for Export metadata to OAI-PMH (including WorldCat).
You will be able to access your collection's metadata in the OAI-PMH feed after about 20 minutes. Archive-It's SOLR index needs to update before trying to access it.
Access the OAI-PMH feed
Request the feed
You will need to use a combination of Archive-It's endpoints and "verbs" in your browser's address bar to request the feed. The feed will be displayed in XML format.
Endpoints
The endpoint for Archive-It's OAI-PMH server is: http://www.archive-it.org/oai
This can be used for general information.
For more specific information about your collection's feeds, you will need to add your account number to this endpoint http://www.archive-it.org/oai/organizations/[Organization Number]
- Replace [Organization Number] above with your account's number.
- You can find your number in the browser's address bar when in your account following the prefix https://partner.archive-it.org/.
- Paste this endpoint with your account number in a new tab's address bar.
Verbs
Follow endpoints with "verbs" to make requests for feeds. There are 6 "verbs" you can combine for your requests:
?verb=Identify
- Returns basic information about the OAI-PMH repository.
- For example: http://archive-it.org/oai?verb=Identify
?verb=ListMetadataFormats
- Returns a list of all metadata formats available.
- For example: http://archive-it.org/oai?verb=ListMetadataFormats
?verb=ListIdentifiers&metadataPrefix=[Metadata_Prefix]
- Returns a list of all record identifiers with date of last modification.
- Archive-It metadata is usually in Dublin core
- &metadataPrefix=oai_dc
?verb=ListRecords&metadataPrefix=[Metadata_Prefix]
- Returns all metadata records for the given prefix.
?verb=GetRecord&metadataPrefix=[Metadata_Prefix]&identifier=[Identifier]
- Returns the metadata record the [Identifier] points to with the XML returned in the designated [Metadata_Prefix].
?verb=ListSets
- Returns list of available Sets.
- Sets are constructed at the organization level.
- For example: http://archive-it.org/oai?verb=ListSets
- Sets can be the organization's collections available in OAI-PMH feeds as well.
Pagination
XML feeds requested display the first 100 records by default. At the end of the 100 records you can see a <resumptionToken>. You can adjust this to advance to the next 100 records or your preferred number of records.
In the above example, the <resumptionToken> appears at the end of the XML feed for this request: https://archive-it.org/oai/organizations/1036?verb=ListRecords&metadataPrefix=oai_dc
It tells us that this XML feed:
- Begins at the very start to our list of records (cursor="0").
- There are 447 total records in the feed (completeListSize="447").
- This page shows its first 100 records.
We can use this resumption token's information in a request to advance the feed's display to the next 100 records:
Or we can use it to display all records in the feed with the following request:
Metadata formats and schema
Archive-It seeds and collections use the Dublin Core standard as the default metadata schema. This is the same baseline standard for the OAI-PMH protocol.
The OAI-PMH server currently also converts the original Dublin Core to export MARC records. We will update this guidance to include further abilities to export in more formats, such as MODS, EAD, and RDF.
Metadata formats for OAI-PMH can be returned from the URL: http://archive-it.org/oai?verb=ListMetadataFormats
For example:
<metadataPrefix>oai_marc21</metadataPrefix>
- This section of the results returns the prefix, in this case "oai_marc21" which is the MARC21 format.
<schema>http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd</schema>
- This section returns the URL for the metadata schema's definition and dictionary.
Collection and seed identifiers
The collection <identifier> metadata field will be a link to your collections page at archive-it.org.
- For example: http://www.archive-it.org/collections/649
The seed <identifier> metadata field will be a link to the seed's calendar page.
Find your content in external catalogs
Content can be found using the Advanced Search on Worldcat.org:
- Use the format “archival material”
- Use the facet “downloadable archival material”
OAI-PMH feeds are at the collection level, so search with either collection titles or keywords. For example, searching the collection title "Jasmine Revolution - Tunisia 2011" leads to this record: https://www.worldcat.org/title/1358631753?oclcNum=1358631753
Once inside the record, use the Access Free button or View online link to access the Archive-It collection.
Please note that if the feed is for a private collection, the link will still direct to the collection page on Archive-It.org. But the collection content will not be available in the external catalog.
OCLC's WorldCat catalog harvests the Archive-It OAI-PMH feed on a monthly basis.
Comments
0 comments
Please sign in to leave a comment.