I am archiving COVID-19 websites, and I have many sites that come up with 403 errors. Looking at the Archive-It Help Center, I need to reach out to those website owners to ask to be whitelisted. I have reason to believe that some of these small sites may not be hosted in house and owners may not know what whitelisting means (I am just learning it myself!). What information do I need to give them to help them allow the crawler access to their sites for crawling?
How do they actually whitelist the crawler or IP range (what is the IP range for the crawler)?
Please sign in to leave a comment.