This is the third in a three-part Archive-It Workshop series for new Archive-It users. It was initially developed and presented to new members of the Community Webs cohort. Other workshops in this series include Selection, Scoping and Crawling, and Reviewing & Quality Assurance

Introduction

We collect web archives to be used! But how easily can people find what they need, when they need it? This module will train you to provide curation and context that will help web archives to reach their intended users. We will use Archive-It’s website, instructional documentation, and partners’ case studies to demonstrate web archive access points and descriptive options. You will apply these lessons to your own Archive-It collections, add tools to analyze usage, and identify opportunities to improve your users’ experiences finding and using your web archives.

Objectives

Learn the primary points of access to your web archive collections and how end-user patron stakeholders can experience them.
Understand the options for enabling and enhancing those experiences with descriptive metadata.

You will learn:

...where to find the general public access points to your web archive collections and their respective seeds
...how to navigate the contents of web archive collections, through faceted metadata browsing and through full-text search

You will:

...find materials in web archive collections through searching and browsing exercises
...apply helpful descriptive metadata to collections and seeds with the Archive-It web application
...identify areas for development that will improve your stakeholders’ experiences of finding and using your collections

When you’re ready for advanced topics, you can:

...add Plausible Analytics tracking to your account for access statistic
...try adding your own custom access point to search results
...practice retrieving metadata with Archive-It’s APIs

Training Recording

This workshop was presented as part of a training series for new Community Webs partners.

▼ Watch Recording

Recorded June 24, 2021

Materials

To participate, you will use:

The Archive-It public website
The Archive-It web application

At least one completed web crawl

This .ODS spreadsheet

A spreadsheet editing software (Excel, Numbers, Google Sheets, Open Office Calc, or similar)

This example HTML webpage

A text editor (Notepad on PCs, TextEdit on Macs, Vim, Sublime Text, Emacs, or any other you have available is ok!)

Learn

Users typically access web archives in two ways: browsing and searching. Both actions are enabled by default for any collection that is toggled “public” for Archive-It’s website, archive-it.org. Users of the Archive-It web application and/or their stakeholders on a web development team can enhance these built-in points of access for faster and richer access experiences, too.

Browse

Users can discover the topics and themes, areas, or people related to a partner organization’s public web archives by browsing their default access point on archive-it.org. To find yours, go to: https://archive-it.org/organizations/[your-number]/.

Or find a list of other Community Webs participants’ access points here.

Browsing collections

Each partner can use the Archive-It web application to add the following descriptive information and tools to their organization’s access point:

Organizational logo, home URL, and description
Collection information: A name and optional description of each public collection in the partner’s account, including as much or as little descriptive metadata as desired.
Faceted metadata: The metadata added to the collections above will then also appear automatically in a browsable list that users may navigate in order to reduce/refine the list of public collections shown to them.

Browsing seeds

Users may browse and learn more about the contents of any Archive-It partner’s public web archive collection by selecting its name from the list above:

Collection information: The same name and description visible from the partner’s front page can be seen here, along with an optional logo to represent the collection.
Seed descriptions: The original URL of the archived material in the collection, plus any optional title, description, and additional descriptive metadata desired.
Faceted metadata: Like the collections’ metadata before, users may navigate the optional values above (other than Title or Description) in order to refine their view of the collection’s public seed list.

The only required piece of information for each seed is its URL. A user may select any one of these from the list in order to navigate its web archives.

Learn more: Want to see learn how your web archiving peers describe their collections? The Archive-It partners and friends on OCLC Research’s Web Archiving Metdata Working Group produced this guidance document for librarians and archivists, to recommend the metadata fields and values that best fit their materials and goals.

Group members describe their experience developing and implementing the recommendations in the recorded webinar: Describing Web Archives.

Search

Users may search through the partner-curated metadata above and even the full text contents of web archive collections in order to find what they are looking for specifically.

Archive-It’s search engine parses and indexes the text collected by each crawl within seven days of its completion, depending precisely upon the volume of material to process. Users may then search their chosen keywords and refine their results with several additional options. They may search within a web archive collection, across a partner organization’s collections, or even across all public Archive-It collections.

Search the contents of a web archive collection

Users may explore the contents of any public web archive collection with the search tools on the collection’s archive-it.org access page:

The free text search bar enables users to search for specific keywords or phrases.
Users may then navigate their search results under two tabs:

Sites: Listing the seed URLs with descriptive metadata that match the search criteria.
Page text: Listing the web pages or documents throughout the collection that contain the keyword or phrase, summarized by the original site on which they appear.

Search results may be further refined by…

Sites: The same faceted metadata from among the listed seed URLs that appear in the browsing experience.
Page text: Advanced options to filter the results by source, timespan, file format, and more.

Search across a partner’s collections

Users may also explore the contents archived among all of an Archive-It partner’s public collections by performing a search on the organization’s default access page on archive-it.org:

The free text search bar enables users to search for specific keywords or phrases.
Users may then navigate their search results under three tabs:

Collections: Listing the partner’s collections with descriptive metadata that match the search criteria.
Sites: Listing the seed URLs with descriptive metadata that match the search criteria.
Page text: Listing the web pages or documents throughout the collection that contain the keyword or phrase, summarized by the original site on which they appear.

Search results may be further refined by…

Collections and Sites: The same faceted metadata from among the listed collections or seed URLs that appear in the browsing experience.
Page text: Advanced options to filter the results by source, timespan, file format, and more.

A user may select any one of the results in their resulting list in order to navigate its web archives.

Learn more: Tune into the Archive-It Under the Hood Tips & Tools webinar (at 30:20) for more about Archive-It’s metadata (Apache Solr) and full-text (Elsasticsearch) search technologies.

Navigating the web archive

Whether browsing or searching, end user patrons can use the access points above to find and explore your archived web captures. Choosing a seed URL or search result will take them to a calendar page of dates upon which each capture was added to the web archive:

image of a Wayback calendar access page

Selecting any one of these dates will take the user to the fully functional web material as it appeared at the time of capture. As they navigate, the banner at the top of each page timestamps the relevant material and introduces and links the user to the partner and collection to which the material belongs. The user may use these links to return to the access points above, including a calendar of capture dates for any page that they navigate:

Do

Follow the steps below in order to add descriptive metadata to a web archive collection, to its seeds, and to find them for yourself with Archive-It’s browse and search functionalities.

Add collection metadata

In the Archive-It web application, navigate to the Metadata tab in one of your web archive collections to begin adding collection metadata:

Options to describe and contextualize your collection include:

Add an image to represent your collection at the top of its access point on archive-it.org.
Select as many as three “topics” to add to your faceted metadata, linking your collection to others curated by Archive-It partners among the same general themes.
Click the “Edit” button to add additional metadata about your collection’s scope and topics, using the Dublin Core Elements set of metadata attributes and your preferred additional custom fields.

Add at least one subject and short description. Click the “Save” button next to each in order to apply it to your collection.
When finished adding metadata terms, click the “Done” button at the top-right.

Note that it may take up to 15 minutes to see your changes reflected on the public website. Once this time has passed, visit your collection’s access point on archive-it.org to see your added description and/or this example from above.

Click on the linked metadata terms in your collection’s new description in order to see other Archive-It collections with the same keywords. What do you find here? Are these the right collections to add context to yours? Do they raise new ideas for description?

Add seed metadata

Navigate to the “Seeds” tab in your collection to begin adding metadata to seeds:

Select a seed from your list to edit by clicking on it and navigate to its own Metadata tab:

Add at least a title and short description to your seed URL in order to describe and contextualize it within your collection for the end user.

Bulk seed metadata

When you are ready to add metadata to seeds in larger batches than one-at-a-time, navigate back to your collection’s Metadata tab and open the Bulk Seed Metadata section:

Replace the seed URLs and sample metadata values in this sample spreadsheet with your own seeds and values. You may add more than one value to the same metadata attribute by adding an additional column with the same heading, such as the “Subject” attribute in this example. You may also add your own custom metadata attribute by making it a column heading in the spreadsheet, such as this example’s “Host” attribute.

Click the “Upload File” button to see a preview of your metadata changes. You may choose to apply these changes in addition to or over any existing metadata that you may have added prior. Click the “Save” button when you are ready to apply your changes:

See how the added metadata attributes become links and are faceted on the left-hand side of the page for browsing. Do these narrow down your choices helpfully? How many more (or fewer) values might help an end user to navigate seeds within your web archive collection?

Advanced: Want to see your collections in more places than archive-it.org? You can pipe the same descriptions and metadata above into other and customized spaces. You may for instance enable an OAI-PMH metadata feed for access points on the web that harvest them, such as WorldCat. For even more custom options, Archive-It’s Partner API makes public collection and seed metadata available for use in JSON, XML, and CSV formats.

Search

Once your metadata are applied to your collections and seeds, and they are visible to end user patrons on archive-it.org, it’s time to search the full text contents of your archive. Note that you must have a ‘production’ (ie. not test) crawl completed ≥7 days prior in order to search reliably. Otherwise, you may use the example above.

To search within the Archive-It web application, navigate to the Search tab of the Archives section, enter a keyword or phrase from your seeds’ descriptive metadata, and perform a search for it within its relevant collection:

Search results appear below, under two headings: Metadata Results (any seed URLs with matching query language) and Full Text Results (matching query language from throughout your collection):

As with the public-facing access points on archive-it.org, either or both linked resources will lead you to navigate the captured contents of your web archive collection.

Did your matching metadata appear among these results? What other results appear in the full text? Do they suggest any additional descriptive metadata that could assist your end user patrons?

Advanced search

Both the Archive-It web application and public access site include advanced search options to assist in narrowing the users’ preferred search results:

The only criterion required for advanced searching is the “Collection,” however you may also try pre-filtering your results above by:

Exact phrase: Exclude all results that do not include this exact keyword or phrase.
Exclude phrase: Exclude all results that do include this exact keyword or phrase.
From the host: Specify a website from which you want to retrieve results.
Total documents per host: Specify the number of results per website that you want to view by default, without yet expanding to view all results from that site.
File type: Specify the retrievable file format of the documents that contain your keyword or phrase.
Capture date range: Bound a timespan for results by when they were collected and preserved.

Advanced: Want users to search your web archive collections from your own website? You can use a few lines of HTML code to create your own search bar on a page that delivers users to the results on archive-it.org, or use a customizable API query to pipe those results straight into your preferred domain.

To practice with the HTML form for a metadata- and full text search-based access point on your own or another website:

Download this HTML file.
Use a text editor to change the example collection number 6580 on line 9 to one of your own.
Open the file in a web browser and use the search bar to explore the contents of your web archive collection.

Archive-It partners from the Maryland State Archives describe their experience maintaining a custom search portal to their archives in the recorded webinar: Access to Archive-It Collections.

Track and analyze access to your web archives

Partners may use Archive-It's Plausible Analytics integration to track and analyze use of their web archive collections.

Advanced: Want to clean up and dig into your access and usage data? Read this case study to learn how the Government Publishing Office’s Federal Depository Library Program (FDLP) made the most of their Archive-It account’s analytics integration.

Share and reflect

When and where will your patrons and stakeholders expect to find useful information in your web archives? To understand and anticipate their needs in your access point/s, develop a few handy “personas” to guide your decision making. Ask yourself, and develop a persona to answer:

Who is the patron or stakeholder? A library colleague, a teacher or research? A member of the general public?
What is their research or reference question? Why would I recommend that they find the answer to it in the web archives?
What is the shortest route to this question’s answer? Should they browse the available options or search for something specific directly?
What will they need to understand about web archives as a primary source in order to make sense of their contents? Where might they hit an obstacle?

Where do the options and methods above fall short? Are there new and different kinds of access beyond browsing and searching, that you will need to support? How can Archive-It’s platform develop from here to meet the needs? What further descriptive recommendations will you add to the OCLC group’s? Or ask from your colleagues in the Community Webs cohort?

Learn more: Internet Archive’s Jillian Lohndorf describes experiences providing front-line reference services with web archive collections in the recorded webinar: Archive-It as a Reference Tool.

Articles in this section

Archive-It Workshop Series Part 3: How to find and use web archives

On this page:

Introduction

Objectives

Training Recording

Materials

Learn