Where to find your Seeds report
Next to its "Overview" tab, each crawl report has a "Seeds" tab that displays information specific to how each seed in your crawl was archived.
What's inside the report
In the "Seeds" tab of a crawl report, you can review the status of each seed at crawl time, the total number of documents and volume of data archived from each, and you can view how each seed site renders when replayed by Wayback. It is also possible to drill down by seed to view a report detailing every host to which each seed led the crawler.
Documents and data
Figures for the number total documents, volume of total data, number of new documents, and volume of new data archived during your crawl are all visible at the top of the Seeds report. (To understand the difference between total and new documents/data, refer to our explanation of data de-duplication). These figures are also contextualized graphically so that you can compare how many new documents and how much new data was archived from each seed. New documents and data amounts are also listed in the table of seeds at the bottom of this tab.
Crawl status
The table at the bottom of the Seeds report tab lists the "seed status" for each seed in your crawl. The Seed Status column indicates whether each seed was successfully crawled, redirected (and subsequently crawled), not crawled, queued, or blocked by a robots.txt exclusion. If you observe that a seed is listed as "not crawled," take note of any accompanying code in the column; our crawler codes crawling errors differently based upon what, if any, problems it encounters. For explanations of these codes and guidance on how to respond to them, see: How to interpret crawl status codes.
Hosts, by seed
Clicking on any of the seed URLs listed in the Seed report's table will take you to a detailed report on all of the unique host domains to which that seed led our crawler. To understand how to interpret this more detailed report, refer to our standard guidance on reading host reports.
Wayback view
Each captured seed listed in this report is accompanied by a "Wayback" link in the far-right column of the seeds table. Follow this link to view the seed site as it replays in Wayback. For test crawls, this is an especially helpful way to evaluate the completeness of your crawl before electing to save or delete the data.
Whether you are reviewing the report from a test or production crawl, it is important to remember that these links will only lead to a working Wayback view of your seed site approximately 24 hours after the crawl completes. In the intervening time, the Wayback link in the crawl report will only return a "Not in Archive" message.
Comments
0 comments
Please sign in to leave a comment.