What we're building
We deploy bug fixes, performance improvements, and feature updates regularly. Here's a summary of some of our recent progress:
July 2024
- Fixed a persistent issue where Wayback links were not being rendered in Crawl Reports
- Fixed a persistent issue where Host Reports would incorrectly attribute some collected documents to a 'https:' or 'http:' host
- Updated crawl configuration logic to more efficiently deploy yt-dlp for audio/video document collection
- Removed 'Related Archival Materials' as a predefined Seed level metadata field
- Replaced 'ARS' link in top navigation bar with 'ARCH' link
- Continued work on Heritrix crawler cluster optimization
- Refactored postcrawl processes to address reports of delayed crawl processing
- Brought services back online and conducted thorough clean-up and postmortem following an unplanned outage
- Other replay fixes and bug squashing
Past Updates ⇣ (click to expand)
April 2024
- Heritrix crawler cluster optimization, including postcrawl processing improvements and software development kit upgrades and modernization.
- Brozzler performance improvements, including adjustments to improve efficiency of capture when using yt-dlp
- Database and SOLR query optimizations, aimed at improving load times of partner.archive-it.org and archive-it.org
- Ingest of Google Analytics host and report data into Archive-It’s self-hosted instance of Plausible Analytics
- Other replay fixes and bug squashing
Future development
Below are some of the top feature requests and enhancements on our roadmap for future release. If you have additional ideas, please let us know via the Archive-It Feature Request Forum.
Status |
Timeline |
Description |
In Progress |
2024 |
Scheduled crawl improvements: Simpler crawl scheduling; Ability to use Standard and Brozzler crawling technologies at the same frequency; Ability to schedule different concurrent crawls at the same frequency |
In Progress |
2024 |
Public site redesign: Giving archive-it.org an updated look and feel, with performance and accessibility improvements |
In Progress |
2024 |
A/V crawl configuration: Optimize crawls based on your needs to collect audio/video documents |
In Progress |
2024 |
System health and modernization: Migrating partner.archive-it.org backend software and infrastructure to updated versions |
Planned |
TBD |
Tool-tips and updated in-app messaging |
Planned |
TBD |
Expanded data deduplication capabilities |
We’ll update this list as our roadmap evolves, but please check our announcements forum for information on new releases!
Bugs we're fixing
- We’re aware of sporadic issues collecting .mp4 files from YouTube. We’re actively working to improve consistency.
- Our Standard crawling technology occasionally speculates an excessive number of URLs to be crawled on some content management systems, resulting in inefficient performance and large numbers of queued documents. We’re testing a few fixes.
Comments
0 comments
Please sign in to leave a comment.