Google Docs (documents and spreadsheets in the formats native to the Google Drive web platform) can be collected, preserved, and shared with Archive-It’s current software suite. For reference, here are the Archive-It team’s current recommendations for archiving them:
How to archive Google docs individually
To collect each directly, format the Google Doc’s seed URL exactly as it appears in the archived example cases below, substituting your document's unique identifying alphanumeric string where appropriate:
- Google doc: https://docs.google.com/document/d/124LjR2jsB8YYbxSHc09NZE1QEKm9iI_f5e3_x4skQc0/edit
- Google sheet: https://docs.google.com/spreadsheets/u/0/d/1BhiZ2lDuKVk3RewgTZC_SqdD4XWZxvvCnr6EDC188OQ/htmlview
Do not put a trailing slash (/) at the end of the seed URL. Use the One Page seed type in order to archive only the doc or sheet seen at your seed URL, or One Page Plus in order to archive it and its links out to other web pages.
How to archive links to Google Docs automatically
To archive the links between them and any Google Docs, add the following scoping rule/s to your seed/s and make sure to collect them with Brozzler:
- Expand scope to include URL if it contains the text: docs.google.com/document/
- Expand scope to include URL if it contains the text: docs.google.com/spreadsheets/
Note that capture tools cannot yet navigate the links between Google Drive folders and their contained docs, so Archive-It does not recommend using Google Drive folders as seed URLs.
What to expect
Some Google spreadsheets archived automatically will display error messages in addition to their contents. There is no known solution to this Wayback replay issue yet, however the contents of these crawls may be collected and preserved while replay improvements are made.
Comments
0 comments
Please sign in to leave a comment.