Crawl Codes
Hello. I have noticed that some seeds have codes such as 404, -404, and 503 in the crawl status report, but crawled just fine. Why does this happen, and do any adjustments need to be made to seeds when these codes appear? Thank you!
-
Official comment
Hi Jessica,
Great question! These three crawl status codes all indicate a live web issue (you can read more about crawl status codes in the Help Center here).
Before a site is crawled, the crawlers do an initial ping when they first interact with a seed (kind of like a knock on the front door) before diving in to capture data. While the status usually matches the overall state of the live site, as you can see, sometimes it doesn't. When you see these status codes but the site was crawled successfully, they are likely reflecting the status that the website returned during the initial ping, which might have been temporary, or just what the site advertises to crawlers when they're making that initial knock see if anyone is at home.
Reviewing your Crawl Reports for completeness (and to check if adjustments are needed) when you see unexpected crawl or seed status codes is a good idea, and as always, please feel free to submit a support ticket if you are unsure!Raven, Archive-It Staff
Comment actions
Please sign in to leave a comment.
Comments
1 comment