On this page:
Creating and managing a collection includes both the mechanical steps of creating a collection and adding content in Archive-It, as well as the decision-making process of what the collection should be about, what should go in it, how frequently it should be collected, and how it fits into the overall data budget. To this end, developing a collection’s content is up to the individual or institution, and often involves connecting a mission with any relevant policies, such as a records retention or collection development policies, to set web archiving goals.
How this is accomplished, and its end results are varied, but there are resources and examples for guidance: https://communitywebs.archive-it.org/collection-development.html. Our white paper on the Web Archiving Life Cycle also delves into the process of collection development, exploring the intersections between the technical abilities of crawling content, and the realities of balancing a data budget, with web archiving goals.
Create a new collection
Click the Create a Collection button on either your account's home page (shown below) or from the homepage of the Collections tab selected from the top black navigation bar in the web application. You will be prompted to give your new collection a name.
Once you create a collection, it will have its own dedicated management space in your account:
Add seeds
A seed is an item with a unique identifier in the Archive-It backend. A seed has associated data that does not change, like the dates on which it was added or updated and its crawl history. Seeds also have data that can be edited like Seed Level Metadata, notes, and even the seed URL.
A seed URL is both a starting point for the crawlers, as well as an access point to archived pages.
Add seeds to a new collection via the button on the Overview tab. You can add additional seeds to your collection at any time using the Seeds tab.
Manage collection settings
Navigate among the tabs across the top of the collection management space in order to manage your collection's settings, scope, and functions:
- Overview - Review summary data about your collection, including documents and data archived. Edit the collection name by clicking the pencil icon beside the name. Use the settings to adjust the visibility (public/private) and regular crawling status (active/inactive) of the collection.
- Seeds - Manage your seeds' settings, types, crawl frequencies, and metadata, either individually or in bulk. Access archived captures of your seeds using the Wayback links.
- Crawls - Review reports for current and completed crawls performed in this collection.
- Collection Scope - Modify your collection's crawl scope in order to refine how our crawler decides what does and does not belong in it.
- Metadata - Add, edit, or upload collection level metadata.
- Wayback QA - Manage documents and data discovered during the Quality Assurance (QA) process for this collection.
Comments
0 comments
Please sign in to leave a comment.