This article will help you:
decide how to format your seed URLs so that you collect the content you want
Add new seeds to a collection
Share seeds between collections
On this page:
What exactly is a seed?
A seed is an item with a unique numerical identifier in the Archive-It backend. Some information about a seed does not change, like the date it was added or updated and its crawl history. Seeds also have data you can edit like Seed Level Metadata, notes, and even the seed URL.
Seeds can perform two important tasks in Archive-It. They tell the crawler where to go on the live web, and provide it with information on what to collect. They also point users to the archived version of their URL in Wayback. You can use seeds for one or both of these tasks. A seed does not need to appear in a public collection page for archived pages collected from it to be accessible. Similarly, you don't need to include a seed in a crawl for it to point to a page in Wayback if the URL is already archived.
How to format your seed URLs
Seed URLs can point the crawler to...
an entire website http://www.whitehouse.gov/
a specific directory of a website http://www.whitehouse.gov/issues/foreign-policy/
a specific document or file http://www.whitehouse.gov/sites/default/files/rss_viewer/national_security_strategy.pdf
Generally, a URL copied from a browser's address bar will have correct formatting. There are, however, important principles to remember before adding these URLs as seeds:
- Do you need a / (slash) at the end of the URL? Archive-It's crawling technologies (Standard and Brozzler) handle the / at the end of URLs differently. Refer to the default scope article to determine whether a / is necessary for your use case.
- Does the URL redirect to something else in your browser? Generally, we recommend using only as much of the URL as you need to end up on your target website. For example, the site http://myexamplewebsite.com automatically redirects to http://myexamplewebsite.com/home, you should use http://myexamplewebsite.com as your seed URL.
- Does your URL have a # (hashtag)? Anything that comes after a # (hashtag) in a seed URL is ignored by crawlers, which could significantly change the scope of your crawl.
How to add new seeds to a collection
To add one or more new seeds to your collection, navigate to that collection's "Seeds" tab and click the Add Seeds Button:
Finish by clicking the Add Seeds button. The new seed(s) will appear in your collection.
You can delete seeds from a collection by selecting them from the Seeds list and clicking the Delete button.
Deleted seeds will no longer appear in your collection's seed list or as an access point on Archive-It.org.
Deleting a seed does not delete any Wayback captures.