Troubleshooting dynamic web content

Overview

While we continuously investigate and implement improvements, some websites are not created in a way that is "archive-friendly" and can be difficult to collect or replay in their entirety. These difficulties affect all web crawlers, not just Archive-It's. When selecting seed URLs and reviewing your archived content, please keep these limitations in mind. For more information on what makes sites archive-friendly, see the Library of Congress's Creating Preservable Websites.

On this page:

About
Troubleshooting
Outcome
Related content

About

Dynamic web content refers to content that changes based on the behavior and preferences of the user and is a known web archiving challenge. While many sites with dynamic web content can be archived without issue, there are some types of dynamic web content that can be difficult to capture or replay. Particularly, anything highly dependent upon human interaction (for example, if a click is needed to activate something), or JavaScript (for example, when you mouse over a word and a drop-down menu suddenly appears).

Known issues

Here are some examples of dynamic web content that can be challenging to archive:

Images or text size that adjust dynamically to browser size
Maps that zoom in and out
Downloadable files
Media that requires clicking a “play” button
Navigation menus
JavaScript based pagination
3D virtual tours

Troubleshooting

1. Review our Quality Assurance overview

While each situation is different and can sometimes need special attention, it is helpful to employ these general recommendations for troubleshooting as a first step.

2. Crawl using Brozzler

If you haven’t yet, it’s also a good idea to try crawling dynamic content using Brozzler (rather than Archive-It’s "Standard" crawling technology). This is because, unlike Standard, Brozzler records interactions between servers and web browsers as they occur, more closely resembling how a human user would experience the web.

3. Try these specific troubleshooting steps:

Dynamic content	Troubleshooting
Images or text size that adjust dynamically to browser size	Try resizing your browser. Sometimes, images that have been collected in one size only will load in Wayback once the browser size is adjusted. Try adjusting your browser size to a variety of widths and heights and refresh the page each time to see if the images or text will display. Enable the Wayback QA tool as you browse your Wayback capture and adjust your browser size. If any new image sizes are detected, run patch crawls on the missing documents.
Maps that zoom in and out	Try expanding the scope of your seed to include the site where the map is hosted from. Use Wayback QA on the archived maps while you try activating the dynamic functionality of the maps. Run patch crawls on any missing documents.
Downloadable files	Review your Hosts report to confirm whether the downloadable files have been collected. Use Wayback QA while activating “download” buttons on the archived page. Run patch crawls on any missing documents.
Media that requires clicking a “play” button	Use Wayback QA while activating “play” buttons on the archived page. Run patch crawls on any missing documents. Confirm that the site is supported by our A/V collection utility, youtube-dl.
Navigation menus	Use Wayback QA while activating navigation menus and clicking on subpages. Try opening subpages from the navigation menu in new tabs. Try crawling pages that were not collected or pages that do not replay in Wayback as individual seed URLs. You can link them to your main seed by using our Groups feature.
JavaScript based pagination	Use Wayback QA while activating pagination and clicking on new pages. Try opening links in new tabs. If you encounter links in an archived page that don't prompt an action in your browser, you may have luck opening them in a new tab. This is a common issue for links in Wayback that are built using JavaScript. Try crawling pages that were not collected or pages that do not replay in Wayback as individual seed URLs. You can link them to your main seed by using our Groups feature.

Outcome

At present, because of general web archiving limitations, we will not be able to capture or replay some dynamic content in its entirety. If you need help troubleshooting please feel free to submit a support ticket and we will investigate.

Articles in this section

Overview

About

Known issues

Troubleshooting

1. Review our Quality Assurance overview

2. Crawl using Brozzler

3. Try these specific troubleshooting steps:

Outcome

Related content

Comments

Articles in this section

Overview

About

Known issues

Troubleshooting

1. Review our Quality Assurance overview

2. Crawl using Brozzler

3. Try these specific troubleshooting steps:

Outcome

Related content

Related articles