The Internet Archive upgraded the Archive-It web archiving software suite to version 7.2 in June 2022. This release includes upgrades and bug fixes for web crawling, archival replay, and full-text search technologies. It also features significant improvements to seed management features, including the ability to move or share seeds among different Archive-It collections. See the summaries and links to further information below in order to learn more about each of these enhancements.
Table of contents:
Seed management
Sharing seeds across Archive-It collections
Archive-It now supports moving seeds by sharing their collected contents across different web archive collections without duplicating labor or data. Partners will find this feature useful when they wish to access and replay seeds’ related contents in new Archive-It collections instead of or in addition to the collections in which they were added originally.
To learn more about when and how to use this new feature, read the Archive-It Help Center page here: Share seeds across collections.
Seed group enhancements
This release includes related enhancements to seed groups in order to support sharing among collections. Seeds may now be assigned to multiple groups within a collection. Groups and their constituent seeds may also be shared in bulk among different Archive-It collections.
Crawling
A/V capture upgrade
The Internet Archive’s engineers have upgraded the Brozzler collecting tool’s audiovisual utility to yt-dlp, a fork of the original youtube-dl project with added features and community support. For more information about these tools and how Archive-It uses them to collect and replay embedded media, read the blog post: The stack: A guide to A/V web archiving with youtube-dl.
Improved error handling
Engineers also updated Brozzler to recover more automatically from errors caused by its virtual Chromium web browser, which had halted some Brozzler crawl jobs before they could collect their intended contents.
Wayback access
Updated calendar page
This release includes a new calendar design for access to web archives in Wayback replay mode:
Wayback calendar page before (left) and after (right) Archive-It 7.2
The new calendar’s look and feel matches the public Wayback Machine collection’s more closely, but it retains the same Archive-It functionalities and introduces some improvements:
- Accessibility features for screen readers and mobile devices
- Quick access to each page’s Wayback index
- Color-coded visual indicators for missing (404) or down (500) pages at capture time
- Updated and accessible error messaging
Full-text search
Engineers also deployed updates to Archive-It’s full-test search engine in order resolve search performance issues caused by heavy bot traffic and bugs in advanced boolean operator tools.
Ongoing and future development
Consult the development roadmap here for information about Archive-It's next steps and new features as they are released. And never hesitate to share your ideas in the Feature Requests forum (requires login).
Meeting Recording
Watch the recording from the June 22, 2022 webinar that introduced these upgrades any time.
Comments
0 comments
Please sign in to leave a comment.