Since its launch in 2006, Archive-It has provided partner institutions the tools to build archives and special collections of historically valuable web content. Partner collections are browsable as they were the day they were captured and full-text and faceted search allows for enhanced discovery of sites, pages, and documents within collections. This access model, however, is still very much based on studying individual resources and searching, clicking, and browsing the archive web in a similar manner to using the live web.
Archive-It Research Services (ARS) aims to complement this method of access by providing new, data-driven access models for web archives that allow for study of the partner collections in new ways. By offering datasets built from metadata, provenance information, entities, links, and other key elements of archived web resources, ARS will facilitate the study of web archives in aggregate and across their entire timespan. These new forms of access will expand use of partner collections by opening the door to new types of research, use, and scholarship. Collection managers and the patrons and users of these collection will be able to gain new understandings of the historic web and have the opportunity to study web archives in new and dynamic ways.
In this example, we created a geo-IP Map for the Archive-It collection, Latin American Government Documents Archive - LAGDA. A total of 33,806 unique IP addresses were extracted from the WAT files generated for the given collection over a period of 9 years and visualized using MaxMind (to geocode the IPs) and CartoDB (to create the time-based visualization). The visualization provides insight into the geographic dispersion of the servers on which this topical collection's content was hosted.
The goals ARS are:
- Facilitate new data-driven forms of research, analysis, and digital humanities scholarship to further demonstrate the value of web archives
- Increase use of partner collections by expanding how these collections can be accessed and queried by users, researchers, and scholars
- Allow institutions of any size access to collection-derived datasets whose creation requires complex processing and substantial computing infrastructure
- Provide the data and access necessary to support new tools, interfaces, visualizations, and other R&D for collecting, managing, and using web archives
In this example, Ian Milligan explores the link network of the Canadian Political Interest Groups collection of University of Toronto generated using WAT files and Gephi. Watch a video where he explicates using WATs and Gephi to analyze this collection.
ARS is an ongoing program working to expand the ways that users can access and study web archives. Feedback and input from the community is welcomed. Email us at firstname.lastname@example.org.
More information about ARS can be found here.