Overview
This page provides an overview of your crawl's seeds report, including where to find it and a detailed description of what's inside.
On this page:
Where to find your seeds report
Next to its "Overview" tab, each crawl report has a "Seeds" tab. This tab displays information specific to how each seed in your crawl was archived.
What's inside the report
In the "Seeds" tab of a crawl report, you can:
- review the total number of documents and volume of data archived from each seed
- review the status of each seed at crawl time
- drill down by seed to view a report detailing every host to which each seed led the crawler
- view how each seed site renders when replayed by Wayback.
Documents and data
Visible at the top of the seeds report, you can find figures for the:
- number total documents
- volume of total data
- number of new documents
- and volume of new data archived during your crawl.
These figures are also contextualized graphically so that you can compare how many new documents and how much new data was archived from each seed. To understand the difference between total and new documents/data, refer to our explanation of data de-duplication.
New documents and data amounts are also broken down per seed in the seed table at the bottom of this tab.
Seed status
-
successfully crawled
-
redirected (and subsequently crawled)
-
not crawled,
-
queued
If you observe that a seed is listed as "not crawled," take note of any accompanying code in the column. Our crawler codes crawling errors differently based upon what, if any, problems it encounters. For explanations of these codes and guidance on how to respond to them, see: Understanding Seed Status.
Hosts, by seed
Clicking on any of the seed URLs listed in the table will take you to a detailed report on all the unique host domains to which that seed specifically led the crawler. To understand how to interpret this more detailed report, refer to our standard guidance on reading host reports.
Wayback view
Each captured seed listed in this report is accompanied by a "Wayback" link in the far-right column of the seeds table. Follow this link to view the seed site as it replays in Wayback. For test crawls, this is an especially helpful way to evaluate the completeness of your crawl before electing to save or delete the data.
Whether you are reviewing the report from a test or production crawl, these links will only lead to a working Wayback view of your seed site approximately 24 hours after the crawl completes.
Related content
Reading your crawl's hosts report
Reading your crawl's file types report
Comments
0 comments
Please sign in to leave a comment.