Archive-It crawl report system updates
We are pleased to announce the release of our new and improved back-end reporting infrastructure. While this is primarily a “behind-the-scenes” change to how crawl report data is stored and delivered to the Archive-It web application, partners may notice some changes as well.
Most significantly, this system was custom-built to support the increased scale of Archive-It crawling and reporting needs; it will provide more reliable access to realtime data in currently running crawl reports and to complete data in recently completed crawl reports.
In addition to making crawl data available more quickly and reliably, there are a few small changes in the web application, including:
- Document lists (such as the “queued” or “out of scope” documents in a Hosts report) now stream directly into the browser instead of downloading as a text file.
- Detail reports that include document lists (such as reports for specific seeds, hosts, and file types) now have a filterable "New" column with values for "Yes" and "No." This replaces the "Data" and "New" columns.
- PDF only crawl reports reflect new data only for PDFs that are archived rather than all content that was discovered during the crawl.
- You may also notice small changes in the overall numbers of documents and data in reports for some crawls. This is because the new system is more precise in reporting the final crawl data volumes that have always counted against your account level budgets.
We are still in the process of backfilling historical reports into our new system, but wanted to make the improvements to recent and ongoing crawls available as soon as possible. Crawl reports from October 2016 until today should be available now, and older crawls will be added as they are available.
This is the latest in a series of developments to Archive-It. You can read more about reports and reporting in our Help Center.
And please submit a support ticket if you have any questions.
Please sign in to leave a comment.
Comments
0 comments