If you find hosts in your Hosts report, or in the detailed report for each seed within your Seeds report, that you do not wish to archive in future crawls, you may exclude them by imposing limits on your crawl scope. Specifically, you may make rules to block any URLs that contain specific text or patterns of text from archiving.
Articles in this section
- Can I run Wayback QA or a patch crawl on a test capture?
- How can I block individual hosts within a domain from archiving?
- What are all these strange sites listed in my hosts report?
- What is the difference between a seed and a host?
- Why does my crawl report tell me that URLs were blocked?
- What is the difference between all and new documents/data?
- What do all the messages in the Status column of my Seeds report mean?
- Why didn't some pages get archived?
- What should I check first in my post crawl reports?
- How do I know what I've crawled?