Overview of Vault
Vault is the Internet Archive’s digital repository and preservation service that provides an extensible, affordable suite of features for institutions to meet their needs in managing and preserving digital collections. Vault leverages the existing non-profit infrastructure and open-source tools of Internet Archive for collecting, providing access, and ensuring the preservation of digital collections and integrates with many of Internet Archive’s widely-used services, such as Archive-It and archive.org. Institutions can use Vault to customize a repository or preservation approach that meets their technical requirements, preservation goals, and financial resources, including extensible features for data replication, geographic redundancy, and fixity reporting. Vault is currently in its pilot stage. If you are interested in joining the pilot partner community or have questions about the service, please email our product team at email@example.com. This guide will be updated as Vault launches in late 2022 or early 2023.
In this article
- Vault Design
- Account Basics
- Managing Your Account
- General Storage and Preservation Policy
- Related Content
Internet Archive has offered digital storage and preservation services to over 1,000 libraries, archives, museums, and cultural heritage and non-profit organizations for over two decades. These range from the basic, free storage and access options at archive.org, to the more robust preservation included in services such as Archive-It, to customized solutions provided via contracts with governments and large institutions. In owning and operating its own data centers and physical infrastructure, Internet Archive is able to offer storage and preservation services at a far lower cost than for-profit, commercial “cloud” computing corporate providers. Internet Archive is also a 501c3 non-profit organization, which allows it to provide these services to mission-aligned organizations free of any profit-seeking or mercantile interest. Internet Archive also has many partnerships and systems integrations with other digital library services and currently stewards over 100 unique petabytes of data, with multiple copies of this collection resulting in hundreds of petabytes of data under management. Vault combines this technical expertise and community alignment and builds on Internet Archive’s experience and infrastructure to provide a low-cost, extensible storage and preservation solution that can meet the needs of a variety of different organizations.
Vault provides advanced repository and preservation and services, facilitates the transfer of data with multiple data ingest and egress methods, includes multiple geographic locations where archived data is stored, includes fixity audit and repair and other digital object management tools, and has a low-cost pricing model based on a one-time price-per-terabyte for depositing data into the system, with no additional annual storage fees or data ingest or egress costs.
Vault design and service principles include:
- Content Diversity: Any type of content, from individual files to datasets to WARCs, to Archival Information Packages (AIPs), can be deposited in Vault.
- Multiple Geolocation Options: Archived data can be stored in Internet Archive data center locations around the world, including data centers in currently 3 nations on 2 different continents. Basic Vault services include storage of data in a minimum of 2 locations and has features for users to select additional, or specific territorial, geographic locations.
- Multiple Data Replication Options: Basic Vault repository services include multiple copies of data housed in multiple locations, with add-on features allowing partners to have additional replicas of their data as needed. As many copies as you wish of your archived data can be stored and preserved with Vault.
- Multiple Technology Architectures: Archived data can be stored in multiple technical storage architectures in order to reduce technology risk.
- Fixity Check Frequency: Fixity checks can be run as frequently as you wish, with the additional capability to apply different frequencies for different collections of archived data and receive reports on all audit and repair actions.
- Third-Party Cloud Replication: Vault includes the option for partners to mirror or have portions of their archived data copied into various commercial and institutional third-party cloud systems. Contact us for more information on this feature if it is of interest.
- API Interoperability: Vault is API-first in design, meaning that most information available in the service’s web application and dashboards is also retrievable via API. Basic API integration for syncing (meta)data to popular external repository services is also possible, as Internet Archive has many existing integrations with peer mission-aligned services, repositories, access, and preservation systems.
Every Vault account includes an interactive dashboard that provides a high-level overview of the archived data in the system, the ability to manage your data as collections, straightforward ways to upload and download content, and other account management features.
Users with a Vault account can log in at https://vault.archive-it.org/. Vault users can also change and manage their passwords via links on the login page.
All Vault accounts have access to an interactive dashboard that allows you to monitor the status of all archived data in the system, including storage location, collection statistics, fixity jobs, manifests, analytics, and reports.
Screenshot of the Dashboard in Vault.
For more detailed information on what you can do with the interactive dashboard, see Interpreting Vault’s Dashboard.
All Vault accounts allow you to manage your archived data as Collections, with a file and folder directory structure and interface. Easily upload content into your collections and organize it as needed with your Collections and their folders.
Screenshot of the Collection page in Vault.
For more detailed information on how to create and manage your Collections, see Creating and Managing a Collection in Vault
Uploading your content into Vault is accomplished via Depositing files, which can be done in four different ways:
- Web Uploader: Vault’s browser/web-based upload tool.
- Command Line (currently in testing)
- Mail Drive (contact us at firstname.lastname@example.org)
- Import IA Collections (coming soon!)
Screenshot of the Web Uploader on the Deposit page in Vault.
For more detailed information on how to deposit files into Vault, see Depositing Content Into Vault.
Managing Your Account
From the Administration section of Vault, you are able to access:
- Current Plan Information: This section provides an overview of what plan you are enrolled in and the services you have purchased (Currently in testing).
The ‘Users’ section and ‘Help Center’ sections do not currently have any content or active features. These sections will be updated and made functional in upcoming releases.
If you are interested in adding users to your account, or if you have questions or challenges during this pilot period, please contact the Product Team at email@example.com.
We are here to support you!
Screenshot of the Administration page in Vault
For more detailed information on how to manage your account in Vault, see Administering Your Vault Account.
General Storage and Preservation Policy
- For all Vault accounts, data deposited in Vault is stored on servers within at least two of Internet Archive's self-owned and self-operated data centers in separate locations with a minimum of two copies at each location. Additional add-on features to a Vault service plan allow users to increase the number of replicas and locations or specify specific geographic regions where preserved data is stored.
- Internet Archive has six primary online data centers in three different countries. Storage of Vault partners' deposited data may also be held in other offline or nearline storage locations for further preservation replication.
- Vault partners’ deposited data is stored and preserved in diverse repository systems and architecture, ensuring a diversity of technological systems with which this data is managed.
- Periodic integrity checks are performed on all Vault partner data in an ongoing manner as part of overall monitoring operations. Fixity reporting in the Vault application and dashboard occurs at least yearly for all accounts. Additional add-on features to a Vault service plan allow for more frequency fixity audits and reporting.
- All Vault partner data is stored and hosted in a controlled-access, alarmed, fire-protected building. Data integrity and system availability are assured using a combination of internal and external systems and processes.
- Security and monitoring of the harvested data are accomplished through a mix of internal and external systems; data integrity through internal routine tests; and system availability through the use of internal and commercial web monitoring services.
- Deposited data is periodically migrated onto new physical media to account proactively for physical media reliability. Monitoring, logging, and notification systems escalate any hardware issues to an on-call team responsible for infrastructure maintenance.
- Incidents such as service outages, network issues, or other irregular performance parameters exceeding operating tolerances are detected, tracked on system support tools, and addressed promptly.
- Partners are notified in advance of any routine maintenance or system reconfiguration with the potential of service interruption.
- Interpreting the Vault Dashboard
- Creating and Managing a Collection in Vault
- Depositing Content in Vault
- How to Administer Your Vault Account
Last updated on July 6, 2022.