Overview
You may encounter difficulty archiving sites protected by Cloudflare. Cloudflare security products mitigate automated threats to websites. Users can configure them to allow Archive-It’s tools to collect their sites.
Detecting a Cloudflare block
To determine if Cloudflare blocks your crawls:
- Check your Seeds report. Blocked sites will report a "Crawled (HTTP error 403)" seed status.
- Review your results in Wayback mode for Cloudflare-branded error messages.
Troubleshooting a Cloudflare block
If your crawl using Standard crawling technology encountered a Cloudflare block, run a new crawl using Brozzler. If your Brozzler crawl is also blocked, you may need to contact the website's administrators to allow archiving.
Configuring Cloudflare to allow archiving
The administrators of sites that use Cloudflare must configure a custom rule to allow Archive-It's collecting tools:
- Instructions from Cloudflare: Configure a custom rule with the Skip action
- More about Cloudflare's firewall tools: Cloudflare Web Application Firewall
Contact Archive-It support for the latest information to include in your request to the site's administrator.
What to expect from archived Cloudflare sites
You may see “403 (Forbidden)” or Cloudflare-branded error messages in replay. Contact Archive-It support to remove end user access to these errors.
Comments
0 comments
Please sign in to leave a comment.