This page provides an overview of crawl reports, including how to find them and what's inside. It also provides a breakdown of the Crawl Overview tab inside the report.
On this page:
- Where to find your crawl reports
- What's inside the report
- Related content
Where to find your crawl reports
You can access all of your crawl reports at any time by way of the Crawls link in the top navigation bar of our web application. The landing page lists all reports for completed crawls under the "Crawl Reports" tab.
For a detailed walkthrough of your post-crawl reports, check out the videos in our Post Crawl Analysis series.
By default, these reports are listed by Crawl ID—a unique identifier displayed to the left of each crawl. You can organize them by the other headers in the table. The reports listings can be filtered by any of the column fields by using the search bar.
|Tip: You can also add searchable notes to an individual crawl reports' Crawl Overview page to easily filter down to.|
What's inside the report
Each crawl report contains four tabs: The Crawl Overview, Seeds Report, Hosts Report, and File Types Report.
By clicking on the Crawl ID link associated with any crawl in your list, you can access a high level summary of how that crawl was conducted. This "Crawl Overview" tab of the report includes:
- summary data on how much total content was crawled and how much, if any, new data was thereby added to your collection. To understand why crawled data might not be archived, see our explanation of de-duplication.
- the crawl's status—finished—and tells you whether it finished due to a time limit or limit on number of documents or data.
- any rules that may have been put in place for crawling—such as scope expansions, document limits, etc., in order to indicate why some new materials may have archived while others did not.
Seeds, Hosts, and File Types
Additionally, each of your crawl reports includes more information in the form of the following specialized reports on seeds, hosts, and file types: