Unknown HTTP Code -63

Comments

3 comments

  • Avatar
    Kenneth Keller

    I did track this down for status codes.  https://github.com/internetarchive/heritrix3/wiki/Status-Codes

    Though I am no closer to prereq I failed.  More as I find out.

    0
    Comment actions Permalink
  • Avatar
    Kenneth Keller (Edited )

    It looks like, now, if there is a file that meets a restriction in the Seed Scope, the entire crawl for that site stops/fails.  This is a new behavior.  It used to skip the file and move one.  So, I believe this is sorta solved.  I'll need to find a new way to block file types.

    0
    Comment actions Permalink
  • Avatar
    Karl Blumenthal

    Hi Kenneth,

    Sounds like you might have already determined this, but the issue in this case is specifically with the scoping rules that block "whois" and "robots.txt" requests. These are necessary processes for our crawling technology to follow before any site can be archived. Other scoping rules should not halt a crawl entirely. We have other methods to avoid robots exclusions when you need them. If you need to add the "Ignore robots.txt" feature for your future crawls for instance, please contact us directly here and we'll take care of it for you. 

    0
    Comment actions Permalink

Please sign in to leave a comment.