Google Docs (documents and spreadsheets in the formats native to the Google Drive web platform) can be collected, preserved, and shared with Archive-It’s current software suite. For reference, here are the Archive-It team’s current recommendations for archiving them:
How to archive Google docs individually
To collect each directly, format the Google Doc’s seed URL exactly as it appears in your live web browser’s address bar, ie.:
- Google doc: https://docs.google.com/document/d/124LjR2jsB8YYbxSHc09NZE1QEKm9iI_f5e3_x4skQc0/edit
- Google sheet: https://docs.google.com/spreadsheets/d/1o2-vHk2gEE_CaTdUzkHVpUXty9ok469xUO8yZnCGEUw/edit
Use the One Page seed type in order to archive only the doc or sheet at the seed URL, or One Page Plus in order to archive it and its links out to other URLs.
How to archive links to Google Docs automatically
- Expand scope to include URL if it contains the text: docs.google.com/document/
- Expand scope to include URL if it contains the text: docs.google.com/spreadsheets/
Note that capture tools cannot yet navigate the links between Google Drive folders and their contained docs, so Archive-It does not recommend using Google Drive folders as seed URLs.
What to expect
Some Google spreadsheets will display error messages in addition to their contents and/or refresh automatically in Wayback replay. There is no known solution to this Wayback replay issue yet, however the contents of these crawls may be collected and preserved while replay improvements are investigated.