On this page:
- What is OpenSearch and why would I use it?
- How to search your archives with OpenSearch
- Further information
What is OpenSearch and why would I use it?
OpenSearch is a loosely structured standard that defines formats for the exchange of search results between search engines. The full draft specification is available at https://www.opensearch.org/Specifications/OpenSearch/1.1. For the rest of this guide, we'll focus on how to use OpenSearch as implemented by Archive-It. See some real examples from our partners who are using OpenSearch: https://support.archive-it.org/hc/en-us/articles/360001231286-Archive-It-Access-Integrations
What you can do with OpenSearch:
Perform search queries with an RSS reader or your web browser
Perform search queries with a script, CGI, or other software
Programmatically manipulate results (for example, you can format results to match your own UI)
What you can't do with OpenSearch:
Add or remove documents from the search engine
Modify the content or meta data of a document
How to search your archives with OpenSearch
Search Queries
To perform a search, you must provide a query, but you may also set request parameters to narrow the results. Please note that regardless of how many total results are found, only the first 99 results will be browsable. This example query...
https: //archive-it.org/search-master/opensearch?q=texas&i=414 |
...will return the top ten hits for the query "texas" from collection 414
. Available parameters are:
Parameter |
Default |
Repeatable? |
Description |
q |
|
N |
your search query |
i |
|
Y |
index to search (default is all) |
n |
10 |
N |
number of hits per page |
p |
0 |
N |
start position |
s |
|
Y |
site (default is all) |
h |
1 |
N |
max hits per site, 0=all |
t |
|
Y |
type (text/html, application/pdf, etc.) |
Repeatable parameters are simply specified on the URL multiple times. For example, this query searches three collections, repeating the i parameter for each collection:
search-master/opensearch?q=carolina&i=194&i=195&i=196
Repeatable parameters are simply specified on the URL multiple times. For example, this query searches three collections, repeating the i parameter for each collection:
search-master/opensearch?q=carolina&i=194&i=195&i=196
User search query
Parameter |
Default |
Repeatable? |
Description |
q |
|
N |
your search query |
This is the query that the user usually types into the search box on the HTML page. The query only applies to the following fields:
- title
- content
- url
Paging
Parameter |
Default |
Repeatable? |
Description |
n |
10 |
N |
number of hits per page |
p |
0 |
N |
start position |
These two parameters are used for paging through the results and usually are not manipulated by the end user directly.
Site
Parameter |
Default |
Repeatable? |
Description |
s |
|
Y |
site, default is all |
h |
1 |
N |
max hits per site, 0=all |
These two parameters are often used in combination. The s
parameter limits the search to specific sites, while the h
parameter specifies the maximum number of hits to show from any one site.
Most of the time, users want to see results from all the sites, which is the default. However, using the s and h paraments, it’s possible to narrow the results to a certain site, or small collection of sites. This query would limit the results to the two sites, showing all the hits from each:
https://archive-it.org/search-master/opensearch?q=foo&s=site1.org&s=site2.net&h=0
Collection
Parameter |
Default |
Repeatable? |
Description |
i |
|
Y |
collection to search, default is all |
Specify the collection or collections to search by using the collection numbers.
Multiple collections
To search multiple collections, repeat the i
parameter multiple times in the querystring. For example, the following Open Search URL will return all the documents in collections 194, 195, and 196 that contain the text "carolina":
https://archive-it.org/search-master/opensearch?q=carolina&i=194&i=195&i=196
Content/Document types
Parameter |
Default |
Repeatable? |
Description |
t |
|
Y |
type: text/html, application/pdf, etc. |
This parameter limits the results to results to those with a type or multiple type:
XML Response
Please note that the layout of results may change across browsers. The OpenSearch specification declares an XML namespace for its extensions to RSS and Atom. Similarly, we declare a namespace for our extensions.
OpenSearch: |
https://a9.com/-/spec/opensearchrss/1.0/ |
Archive-It: |
https://web.archive.org/-/spec/opensearchrss/1.0/ |
Example response snippet:
<? xml version = "1.0" encoding = "UTF-8" ?>
< rss version = "2.0"
< channel >
< title >texas</ title >
< description >texas</ description >
< link />
< opensearch:totalResults >8996205</ opensearch:totalResults >
< opensearch:startIndex >0</ opensearch:startIndex >
< opensearch:itemsPerPage >10</ opensearch:itemsPerPage >
< archive:query >texas</ archive:query >
< archive:index >414</ archive:index >
< archive:urlParams >
< archive:param name = "q" value = "texas" />
< archive:param name = "i" value = "414" />
</ archive:urlParams >
< item >
< title >Texas Musical Drama</ title >
< archive:docId >8185585</ archive:docId >
< archive:score >2.815091</ archive:score >
< archive:site >www.texas-show.com</ archive:site >
< archive:length >30373</ archive:length >
< archive:type >text/html</ archive:type >
< archive:collection >414</ archive:collection >
< date >20090706012618</ date >
< description ><B>Texas</B> Musical Drama Home...</ description >
</ item >
< item >...</ item >
< archive:responseTime >0.985</ archive:responseTime >
</ channel >
</ rss >
|
Response Elements
Each element in the XML response is described below. You can parse the results with an XML parser of your choice. Each hit gets an <item>. Results will not automatically be viewable in the Archive-It Wayback Machine. For results viewable through the Archive-It Wayback Machine, construct links in the below format:
https://wayback.archive-it.org/<archive:collection>/<date>/<link>
Element name |
Occurrence |
Number of occurrences |
Description |
rss |
always |
one |
The top level element of an RSS feed. For more information on RSS, visit this site. |
channel |
always |
one to many per RSS |
The channel is the element that marks the start and end of a logically grouped set of data elements, such as search results. There can be multiple channels per RSS feed. |
title |
always |
one per channel |
The title of the channel. This is the human-readable form of the channel. The title is the keyword or phrase that is used to generate the search results. |
description |
always |
one per channel |
A descriptive phrase describing the channel. The description is the keyword or phrase that is used to generate the search results. |
link |
always |
one per channel |
The URL of the RSS feed. |
totalResults |
|
|
|
startIndex |
|
|
|
itemsPerPage |
|
|
|
query |
|
|
|
index |
always |
one to many per channel |
The id of the AIT collection that was searched. |
urlParams |
always |
one per channel |
The query-string parameters submitted in the search query. |
param |
always |
one to many per urlParams |
The name and value of a query-string parameter submitted in the search query. |
item |
conditional - only occurs if the search result set is greater than zero for the specified channel |
zero to many |
The item is the element that marks the start and end of a specific search result. A specific search result is a Web document, such as an HTML page. |
item/title |
mandatory within an item |
one per item |
The title of an item. The title of the item is the title of the Web document. For example, the title of an HTML page is the text within the <title> tags. |
item/description |
mandatory within an item |
one per item |
Snippets of content from with the Web document that are adjacent to the search keyword or phrase. The description is also known as the item's highlight. |
item/link |
mandatory within an item |
one per item |
The URL of the Web document. |
item/docId |
mandatory within an item |
one per item |
The unique identifier of a Web document within a search result set. |
item/score |
mandatory within an item |
one per item |
The page rank of the Web document. The page rank is a measure of item relevancy. The higher the page rank, the more relevant the Web document is within the search result set. |
item/site |
mandatory within an item |
one per item |
The Web site hosting the Web document. |
item/length |
mandatory within an item |
one per item |
The length in bytes of the Web document. |
item/type |
mandatory within an item |
one per item |
The Web document's mime-type. For example, "application/pdf" is one of the mime-types for PDF documents. |
item/collection |
mandatory within an item |
one per item |
The unique identifier of the Archive-It collection containing the Web document. |
item/date |
mandatory within an item |
one per item |
The date on which the Web document was archived. |
response |
always |
one per channel |
The number of seconds the Archive-It search engine needed to process the search query. |
Comments
0 comments
Please sign in to leave a comment.