The Internet Archive upgraded the Archive-It web archiving software suite to version 7.1 in September 2021. This release includes improvements to web crawling and reporting, account management, and significant upgrades to archival replay. See the summaries and links to further information below in order to learn more about each of this release's updates and features.
Table of contents:
Crawling
Extended crawl durations
Partners may now configure crawls to run for 14 or 30 days. Contact Archive-It’s staff to add this feature to your account when you need to make a capture that cannot be completed within the seven-day time limit available to all partners.
Web application
Account management
Partners may now use a clickthrough terms of service in the Archive-It web application to sign or update their service agreements. They can be invoiced automatically for their agreed upon subscription level each renewal cycle, without the requirement to submit a formal renewal request or confirmation each year.
Crawl scheduling
The web application includes new visual cues to aid in crawl scheduling. New color-coded indicators help partners to see which crawls are scheduled to run in a given collection and which crawls still need to be scheduled before they can run.
Data budgeting
This version of the Archive-It web applications includes more descriptive visual features to communicate the real-time status of partners’ data budgets, including advisories for accounts approaching and restrictions upon accounts exceeding their annual collecting limits.
Reporting for resumed crawls
Internet Archive’s engineers have improved the Archive-It web application’s ability to calculate and report the data collected by resumed crawls. Partners can expect to see more accurate and complete lists of captured and queued documents as well as lower data totals overall for resumed crawl jobs.
Archival replay
Archive-It Wayback
Archive-It 7.1 includes an altogether new Wayback access layer, rewritten from the ground up to increase performance, replay quality, and ease of future and collaborative development. For a summary of these improvements and expectations related to this release, please read the Archive-It blog post: A New Wayback: Improving web archive replay.
Rules Engine
Internet Archive’s engineers have also developed an open source engine for and database of custom code interventions to improve web archival replay for Archive-It partners, a feature enabled by the aforementioned new Wayback software. See the Replay Rules Engine github repository to contribute to or apply replay improvements for Wayback replay access layers.
Proxy Mode
This release ends support for Proxy Mode. The browsing option is incompatibile with the HTTPS protocol that delivers Archive-It partners’ web documents. Wayback content origin and security standards will prevent and mitigate "live leaks" in replay going forward, so please contact Archive-It’s Web Archivists if you require any further assistance with evaluating specific captures or quality assurance options.
Ongoing and future development
Consult the development roadmap here for information about Archive-It's next steps and new features as they are released. And never hesitate to share your ideas in the Feature Requests forum (requires login).
Comments
0 comments
Please sign in to leave a comment.