Where to find your Seeds report
Next to its "Overview" tab, each crawl report has a "Seeds" tab that displays information specific to how each seed in your crawl was archived.
What's inside the report
Under the "Seeds" tab of a crawl report, you can review the status of each seed at time of its crawl, the total number of documents and volume of data archived from each, details regarding each host to which each seed led the crawler, and how each seed site renders when replayed through in Wayback.
Documents and data
Figures for the number total documents, volume of total data, number of new documents, and volume of new data archived during your crawl are all visible at the top of the Seeds report. (To understand the difference between total and new documents/data, refer to our explanation of data de-duplication). These figures are also contextualized graphically so that you may compare how many new documents and how much new data was archived from each of your seeds. New documents and data are also listed in the table of seeds at the bottom of this tab.
The table at the bottom of the Seeds report tab lists the "seed status" for each seed in your crawl. The Seed Status column indicates whether each seed was successfully crawled, redirected (and subsequently crawled), not crawled, queued, or blocked by a robots.txt exclusion. If you observe that a seed is listed as "not crawled," take note of any accompanying code in the column; our crawler codes crawling errors differently based upon what, if any, problems it encounters. For explanations of these codes and guidance on how to respond to them, see: How to interpret crawl status codes.
Clicking on any of the seed URLs listed in the Seed report's table will take you to a detailed report on all of the unique host domains to which that seed led our crawler. To understand how to interpret this more detailed report, refer to our standard guidance on reading host reports.
Each captured seed listed in this report is accompanied by a "Wayback" link, to be found in the far-right column of the seeds table. Follow this link in order to view the seed site as it was captured and replays in Wayback. For test crawls, this is an especially helpful way to evaluate the precision and completeness of your crawl before electing to save or else delete your data. Whether you are reviewing the report from a test or full-production crawl, however, it is important to remember that these links will only begin to lead you to the proper Wayback view of your seed site approximately 24 hours after the crawl completes. In the intervening time, it will only return a "Not in archive" message.