On this page:
- Where to find your crawls
- How to monitor a currently running crawl
- Live updates on URLs and data crawled
- Stop or edit the limits of running crawls
Where to find your crawls
You may immediately access all of the crawls associated to your account and their respective reports from the "Crawls" section of our web application:
You can access crawls and their reports in this section among four tabs that list them by their type or status:
For complete information on the reports made accessible in this section, see our guidance on how to read your crawl's report.
How to monitor a currently running crawl
To monitor any one of your crawls as it runs, navigate to the "Current Crawls" tab of the "Crawls" section of our web application, and click on the Crawl ID # or "View >" link associated to it in the table of crawls. This will take you to a full report on the crawl as its data update in real time. Specific features available for these reports on currently running crawls are outlined below:
A full set of reports, including the Seed, Hosts, and File Types tabs are available for currently running crawls. In order to view updated information for your crawl in any of these, click the "Refresh" button in the blue banner:
Live updates on URLs and data crawled
You may view a list of URLs currently being crawled in real time in the "Recently Crawled" pane of your current crawl's "Overview" tab. These data refresh automatically every 5 seconds, but you may also at any time select the “Refresh” option in the status bar to manually update them. This information can be useful if you are concerned that the crawler has hit a trap and is no longer capturing valid URLs.
The Realtime Graph at the top of your current crawl's "Overview" tab illustrates how a crawl has grown over time and the proportion of new to duplicate data that it has crawled. To view the graph, use the expand arrow button just to the left of the title:
Stop or edit the limits of running crawls
You can manually halt any current crawl from its "Overview" tab by simply clicking the "Stop Crawl" button. Please note that it may take a few moments for the crawl to stop and fully process its reports. The blue banner at the top of the page will inform you as these steps are completed:
You can edit the limits of a crawl while it is in progress, including the document, data, and/or time limits placed on it (edited limits need to be larger than the amount already captured. For example, if a crawl has already captured 10,000 documents, the document limit added to the crawl will need to be larger than 10,000):
Click the "Modify Limits" button to apply your changes to the currently running crawl. If you wish to extend the time of an already completed crawl, consult our full guidance on resuming completed crawls.