On this page:

What is OpenSearch and why would I use it?

How to search your archives with OpenSearch

Search Queries
XML Response

Further information

What is OpenSearch and why would I use it?

OpenSearch is a loosely structured standard that defines formats for the exchange of search results between search engines. The full draft specification is available at https://www.opensearch.org/Specifications/OpenSearch/1.1. For the rest of this guide, we'll focus on how to use OpenSearch as implemented by Archive-It. See some real examples from our partners who are using OpenSearch: https://support.archive-it.org/hc/en-us/articles/360001231286-Archive-It-Access-Integrations

What you can do with OpenSearch:

Perform search queries with an RSS reader or your web browser

Perform search queries with a script, CGI, or other software

Programmatically manipulate results (for example, you can format results to match your own UI)

What you can't do with OpenSearch:

Add or remove documents from the search engine

Modify the content or meta data of a document

How to search your archives with OpenSearch

Search Queries

To perform a search, you must provide a query, but you may also set request parameters to narrow the results. Please note that regardless of how many total results are found, only the first 99 results will be browsable. This example query...

https://archive-it.org/search-master/opensearch?q=texas&i=414

...will return the top ten hits for the query "texas" from collection 414. Available parameters are:

Parameter	Default	Repeatable?	Description
q		N	your search query
i		Y	index to search (default is all)
n	10	N	number of hits per page
p	0	N	start position
s		Y	site (default is all)
h	1	N	max hits per site, 0=all
t		Y	type (text/html, application/pdf, etc.)

Repeatable parameters are simply specified on the URL multiple times. For example, this query searches three collections, repeating the i parameter for each collection:
search-master/opensearch?q=carolina&i=194&i=195&i=196

User search query

Parameter	Default	Repeatable?	Description
q		N	your search query

This is the query that the user usually types into the search box on the HTML page. The query only applies to the following fields:

title
content
url

Paging

Parameter	Default	Repeatable?	Description
n	10	N	number of hits per page
p	0	N	start position

These two parameters are used for paging through the results and usually are not manipulated by the end user directly.

Site

Parameter	Default	Repeatable?	Description
s		Y	site, default is all
h	1	N	max hits per site, 0=all

These two parameters are often used in combination. The s parameter limits the search to specific sites, while the h parameter specifies the maximum number of hits to show from any one site.

Most of the time, users want to see results from all the sites, which is the default. However, using the s and h paraments, it’s possible to narrow the results to a certain site, or small collection of sites. This query would limit the results to the two sites, showing all the hits from each:
https://archive-it.org/search-master/opensearch?q=foo&s=site1.org&s=site2.net&h=0

Collection

Parameter	Default	Repeatable?	Description
i		Y	collection to search, default is all

Specify the collection or collections to search by using the collection numbers.

Multiple collections

To search multiple collections, repeat the i parameter multiple times in the querystring. For example, the following Open Search URL will return all the documents in collections 194, 195, and 196 that contain the text "carolina":
https://archive-it.org/search-master/opensearch?q=carolina&i=194&i=195&i=196

Content/Document types

Parameter	Default	Repeatable?	Description
t		Y	type: text/html, application/pdf, etc.

This parameter limits the results to results to those with a type or multiple type:

https://archive-it.org/search-master/opensearch?q=foo&s=site1.org&s=site2.net&h=0?q=foo&type=application/pdf&type=application/x-pdf

XML Response

Please note that the layout of results may change across browsers. The OpenSearch specification declares an XML namespace for its extensions to RSS and Atom. Similarly, we declare a namespace for our extensions.

OpenSearch:	https://a9.com/-/spec/opensearchrss/1.0/
Archive-It:	https://web.archive.org/-/spec/opensearchrss/1.0/

Example response snippet:

<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"

xmlns:opensearch="https://a9.com/-/spec/opensearchrss/1.0/"

xmlns:archive="https://web.archive.org/-/spec/opensearchrss/1.0/">

<channel>

<title>texas</title>

<description>texas</description>

<link />

<opensearch:totalResults>8996205</opensearch:totalResults>

<opensearch:startIndex>0</opensearch:startIndex>

<opensearch:itemsPerPage>10</opensearch:itemsPerPage>

<archive:query>texas</archive:query>

<archive:index>414</archive:index>

<archive:urlParams>

<archive:param name="q" value="texas" />

<archive:param name="i" value="414" />

</archive:urlParams>

<item>

<title>Texas Musical Drama</title>

<link>http://www.texas-show.com/</link>

<archive:docId>8185585</archive:docId>

<archive:score>2.815091</archive:score>

<archive:site>www.texas-show.com</archive:site>

<archive:length>30373</archive:length>

<archive:type>text/html</archive:type>

<archive:collection>414</archive:collection>

<date>20090706012618</date>

<description>&lt;B&gt;Texas&lt;/B&gt; Musical Drama Home...</description>

</item>

<item>...</item>

<archive:responseTime>0.985</archive:responseTime>

</channel>

</rss>

Response Elements

Each element in the XML response is described below. You can parse the results with an XML parser of your choice. Each hit gets an <item>. Results will not automatically be viewable in the Archive-It Wayback Machine. For results viewable through the Archive-It Wayback Machine, construct links in the below format:
https://wayback.archive-it.org/<archive:collection>/<date>/<link>

Element name	Occurrence	Number of occurrences	Description
rss	always	one	The top level element of an RSS feed. For more information on RSS, visit this site.
channel	always	one to many per RSS	The channel is the element that marks the start and end of a logically grouped set of data elements, such as search results. There can be multiple channels per RSS feed.
title	always	one per channel	The title of the channel. This is the human-readable form of the channel. The title is the keyword or phrase that is used to generate the search results.
description	always	one per channel	A descriptive phrase describing the channel. The description is the keyword or phrase that is used to generate the search results.
link	always	one per channel	The URL of the RSS feed.
totalResults			Description of totalResults
startIndex			Description of startIndex
itemsPerPage			Description of itemsPerPage
query			Description of query
index	always	one to many per channel	The id of the AIT collection that was searched.
urlParams	always	one per channel	The query-string parameters submitted in the search query.
param	always	one to many per urlParams	The name and value of a query-string parameter submitted in the search query.
item	conditional - only occurs if the search result set is greater than zero for the specified channel	zero to many	The item is the element that marks the start and end of a specific search result. A specific search result is a Web document, such as an HTML page.
item/title	mandatory within an item	one per item	The title of an item. The title of the item is the title of the Web document. For example, the title of an HTML page is the text within the <title> tags.
item/description	mandatory within an item	one per item	Snippets of content from with the Web document that are adjacent to the search keyword or phrase. The description is also known as the item's highlight.
item/link	mandatory within an item	one per item	The URL of the Web document.
item/docId	mandatory within an item	one per item	The unique identifier of a Web document within a search result set.
item/score	mandatory within an item	one per item	The page rank of the Web document. The page rank is a measure of item relevancy. The higher the page rank, the more relevant the Web document is within the search result set.
item/site	mandatory within an item	one per item	The Web site hosting the Web document.
item/length	mandatory within an item	one per item	The length in bytes of the Web document.
item/type	mandatory within an item	one per item	The Web document's mime-type. For example, "application/pdf" is one of the mime-types for PDF documents.
item/collection	mandatory within an item	one per item	The unique identifier of the Archive-It collection containing the Web document.
item/date	mandatory within an item	one per item	The date on which the Web document was archived.
response time	always	one per channel	The number of seconds the Archive-It search engine needed to process the search query.

Further information

Articles in this section

Access your web archives with OpenSearch

What is OpenSearch and why would I use it?

How to search your archives with OpenSearch

Search Queries

User search query

Paging

Site

Collection

Multiple collections

Content/Document types

XML Response

Response Elements

Further information

Comments

Articles in this section

What is OpenSearch and why would I use it?

How to search your archives with OpenSearch

Search Queries

User search query

Paging

Site

Collection

Multiple collections

Content/Document types

XML Response

Response Elements

Further information

Related articles