Unknown HTTP Code -63
I cannot figure out what this means. We have this happening on all of our crawls now. Anyone else run into this and/or know the cause? The sites are active and open when I click on the link. The sites are public and not password protected.
-
I did track this down for status codes. https://github.com/internetarchive/heritrix3/wiki/Status-Codes
Though I am no closer to prereq I failed. More as I find out.
-
Hi Kenneth,
Sounds like you might have already determined this, but the issue in this case is specifically with the scoping rules that block "whois" and "robots.txt" requests. These are necessary processes for our crawling technology to follow before any site can be archived. Other scoping rules should not halt a crawl entirely. We have other methods to avoid robots exclusions when you need them. If you need to add the "Ignore robots.txt" feature for your future crawls for instance, please contact us directly here and we'll take care of it for you.
Please sign in to leave a comment.
Comments
3 comments