Overview
Your crawl's Hosts report allows you to change the scope on any host listed. You can also run patch crawls on any hosts that have blocked documents. This article will show you how to change the scope or run patch crawls from inside the Hosts report.
Prerequisites
A crawl report from a Saved Test, One-Time, or Scheduled crawl. Unsaved test crawls won't allow these actions.
On this page:
Change the scope of a specific host
Start in the Hosts report of your crawl report.
- Select the specific host(s) by clicking the checkbox next to it.
- Click the Edit Rules button.
- See the existing rules in place on host(s).
- Select the desired rule from the drop down menu:
- Block a host
- Add a data limit
- Add a document limit
- Ignore robots.txt
- Block URLs if...
- Click Add Rule.
- See the new rule added to existing rules on host(s) at the bottom.
Outcome
Rules added to specific host(s) will be applied on the next crawl.
Run patch crawls on blocked documents
When blocked documents appear in your hosts report, you can run patch crawls on them. Start in your crawl's Hosts Report.
- Look in the Blocked column for any numbers.
- Click directly on the number to see if you need those documents.
- You can copy the URLs from the list and paste in a browser's address bar if unsure.
- Click the checkbox next to the host with the blocked documents.
- Click Run Patch Crawl.
- Click the Ignore Robots.txt check box.
- Click Run Crawl.
Outcome
Patch crawls send the Standard crawler back to the live website to collect the blocked documents. After your patch crawl has run, you will see a medical tool box icon display for any hosts that have been patched.
These patched documents will take 24 hours to index. Check your Wayback page after that to see if the patch crawl helped improve replay.
Comments
0 comments
Please sign in to leave a comment.