If you find hosts in your crawl's Hosts report, or in the detailed report for each seed within your crawl's Seeds report, that you do not wish to archive in future crawls, you may exclude them by imposing limits on your crawl scope. Specifically, you may make rules to exclude any documents that contain specific text or patterns of text from archiving.
Articles in this section
- What are these screenshot:, thumbnail:, and youtube-dl: hosts in my crawl report?
- Why doesn’t my Flash content work?
- Can I run Wayback QA or a patch crawl on a test capture?
- How can I exclude individual hosts within a domain from archiving?
- What are all these other hosts listed in my crawl's Hosts report?
- What is the difference between a seed and a host?
- Why does my crawl report tell me that URLs were blocked?
- What is the difference between all and new documents/data?
- What do all the messages in the Status column of my Seeds report mean?
- Why didn't some pages get archived?
Comments
0 comments
Please sign in to leave a comment.