Overview
While we continuously investigate and implement capture improvements, some websites are not created in a way that is "archive-friendly" and can be difficult to capture or replay in their entirety. These difficulties affect all web crawlers, not just ours. When selecting seed URLs and reviewing your archived content, please keep these limitations in mind.
On this page:
About
POST is an HTTP request-response method that is difficult for Archive-It’s Standard crawling technology (Heritrix) to capture, and difficult for Wayback to replay. Our newest crawling technology, Brozzler, can sometimes capture the functionality of pages employing POST requests, however, because of Wayback limitations, they generally can not be replayed.
Troubleshooting
For best results collecting content that employ POST requests, try the following steps:
- Try crawling pages using Brozzler.
- For pages with a “Load More” type button that employ POST requests to trigger new content, try crawling pages below the fold as individual seeds if they possess a unique URL (e.g. “homepage.com/page2”). You can link them to your main seed for replay by using our Groups feature.
- Perform Quality Assurance on your archived pages and use our Wayback QA tool to run patch crawls on missing functionality. This can help to capture the necessary files to aid future replay.
Outcome
For best results in replay, we recommend right clicking on each URL that uses a POST request in Wayback and selecting the "Open in New Tab" option.
Comments
0 comments
Please sign in to leave a comment.