On this page:
What is a seed?
A seed is an item with a unique identifier in the Archive-It backend. A seed has associated data that does not change, like the dates on which it was added or updated and its crawl history. Seeds also have data that can be edited like Seed Level Metadata, notes, and even the seed URL.
Seed URLs point the crawler to content on the live web and, depending on how they are formatted, help inform the crawler of how much or how little of a site to capture.
Seeds URLs can be:
an entire website (ex: http://www.whitehouse.gov/)
a specific directory of a website (ex: http://www.whitehouse.gov/issues/foreign-policy/)
a specific document (ex:http://www.whitehouse.gov/sites/default/files/rss_viewer/national_security_strategy.pdf)
Documents are any file with a unique URL
Webpages are usually made up of many individual documents. The seed URLs in your collection point the crawler to the documents you want to capture on the live web. Even the unique seed URL is considered a document.
Every single archived document in your Archive-It collections has its own calendar page like the one below, listing each date and time on which it was crawled. When you click on the Wayback link for a seed, you are being directed to the calendar page for that specific document.
Collections are made up of lots of individual documents
After running a few production crawls your collection will be populated with archived documents, which, when viewed together, replay as archived websites.
The seed ID from which a document was captured is recorded in its WARC file. Aside from that, archived documents are not directly connected to the seed record in Archive-It. This means you can delete seeds or edit seed URLs without any effect on your archived content.
Editing a seed URL can change what it points to in the Wayback machine. Read more about how seeds act as access points below.
Seeds and Documents can be access points
Seed URLs can be used to provide direct access points to content in your web archive collections. Seed URLs don't necessarily need to have been crawled to function as an access point, as long as they point to content already archived in a collection. Individual archived documents can also be elevated to an access point using Document Metadata.
- Add a new seed with the URL https://mywebsite.com/aboutme that would automatically point users directly to that page in Wayback.
- Add document metadata to the document https://mywebsite.com/mywork and surface that page as an access point in your public collection.