Zero Queued docs yet crawl continues to collect data

Comments

1 comment

  • Avatar
    Mary Haberle

    Thanks for raising this question. Queued documents will not appear as part of realtime reporting because they are calculated after the crawl completes. In general, time, data, and document counts are the best metrics by which to monitor your running crawls. You can view and increase any one of these limits while a crawl is in progress, but please note that you cannot resume a stopped test crawl. There is more guidance on how to monitor running crawls available in our help center: https://support.archive-it.org/hc/en-us/articles/208332973-How-to-monitor-your-crawls

    However, the blank “Recently Crawled” box you saw when viewing the reports for a running crawl is a bug that we see from time to time with our reporting system when there are a large number of simultaneous crawls running. In response to this scalability issue we are working behind the scenes to improve our reports data system so that we can ensure uninterrupted access to current crawl reports, in addition to more reliable crawl report data that is available immediately upon crawl completion. We plan to bring the improved system online in the coming months.

    In the meantime, running test crawls is the most effective way to safeguard your annual data budget. Test crawls will allow you the ability to review your results in the crawl report and via Wayback 24 hours after a crawl completes.

    0
    Comment actions Permalink

Please sign in to leave a comment.