Whitelisting Sites
Hello,
I am archiving COVID-19 websites, and I have many sites that come up with 403 errors. Looking at the Archive-It Help Center, I need to reach out to those website owners to ask to be whitelisted. I have reason to believe that some of these small sites may not be hosted in house and owners may not know what whitelisting means (I am just learning it myself!). What information do I need to give them to help them allow the crawler access to their sites for crawling?
How do they actually whitelist the crawler or IP range (what is the IP range for the crawler)?
Thank you!
-
Hi Jessica,
Sorry for the late reply! If you haven’t yet, running a crawl using Brozzler is a good first step. It's sometimes able to access content that the Standard crawls can't. If Brozzler doesn't help please consider sending in a support ticket for these sites. There are a number of reasons why a site might return a 403 error and we can help determine whether you need to share IP info with the site owner or if there may be alternate options.
Thanks!
Sylvie
Please sign in to leave a comment.
Comments
1 comment