Websites can be composed of elements from various locations. Archive-It's crawlers collect all embedded elements (images, video players, stylesheets, analytics, etc.), even if their host domain differs from the seed's. The crawler may also discover and collect a few documents from a host before determining that the rest are out of scope. If there are some particularly odd ones, contact us and we will investigate whether or not they present any problems.
Articles in this section
- What are these screenshot:, thumbnail:, and youtube-dl: hosts in my crawl report?
- Why doesn’t my Flash content work?
- Can I run Wayback QA or a patch crawl on a test capture?
- How can I exclude individual hosts within a domain from archiving?
- What are all these other hosts listed in my crawl's Hosts report?
- What is the difference between a seed and a host?
- Why does my crawl report tell me that URLs were blocked?
- What is the difference between all and new documents/data?
- What do all the messages in the Status column of my Seeds report mean?
- Why didn't some pages get archived?
Comments
0 comments
Please sign in to leave a comment.