All Archive-It partners subscribe to our service at a level determined by their annual data budget. This budget determines how much total data you can archive from the web (i.e. 256GB, 1TB, 2TB, etc.) in a single subscription year, with the data resetting to zero upon renewal. Regularly monitoring your data usage in a subscription year is an important part of planning your crawls and saving only so much data as you can afford, as per your service agreement with Archive-It.
On this page:
- How is Data Usage Calculated
- Review Subscription Data Usage
- Review Scheduled Crawls
- Budget Alert Banners
How is Data Usage Calculated
The total data in your account is the sum of the New Data (as listed in the New Data column of your crawl report) captured in saved test crawls, One-Time crawls, or scheduled crawls, and any uploaded WARCs. Data from unsaved, deleted, or expired Test crawl does not count against your data budget.
Review Subscription Data Usage
For the most complete and up-to-date information on the total data archived by your account in the current and all past subscription periods, log in to the web application and review the figures and graphic provided at the left-hand side of the account's landing page:
Expand the "Current Subscription Details" and "Past Subscription Totals" areas of this information pane for more detailed information about each:
*Note that partners may on occasion make multi-year service agreements with Archive-It. In these cases, the figures and graphic above still reflect the data budget for a single year of that agreement.
Review Scheduled Crawls
In order to have a complete picture of your data budget, it is also important to identify any crawls that are scheduled to run on a regular basis, and review their crawling history to get a sense of how much data they are likely to use in your current subscription year.
You can do this by clicking on ‘Crawls’ in the black navigation bar and then selecting the ‘Scheduled Crawls’ tab. This will show you a list of scheduled crawls across all your collections.
Then, visit each collection that has a scheduled crawl. Navigate to ‘Crawls’ and select the ‘Crawl Frequency’ sub-tab to see when each crawl is scheduled to run next.
It is best practice to check in on scheduled crawls periodically to ensure that they continue to be scoped appropriately. You can verify scoping rules by clicking into the collection, and visiting the 'Collection Scope' section, or by clicking on a seed, and then navigating to the 'Seed Scope' tab.
In the Crawl Reports sub-tab, you can filter the Frequency column to view the crawling history of each scheduled interval. This will allow you to see how much data each crawl has accumulated over time. If a crawl had a recent significant increase to its New Data amount, you should review the crawl report, adjust scoping, and run a test crawl prior to its next scheduled crawl.
Budget Alert Banners
An alert banner will appear on your landing page in the web application once your account has used 80% or more of its current subscription year's data budget. At this time, please follow the steps above to ensure you do not use more than your annual allowance and communicate with us about the steps you have taken via a support ticket. Include any questions you have about your data budget.
In the event that you exceed your annual data budget, an over budget alert banner will appear. Please submit a support ticket if you see this banner.
If you have any difficulty or questions about efficiently managing your data budget to support your collecting scope, please contact Archive-It's Web Archivists for direct assistance.