On this page:
- When to use Brozzler
- How to use Brozzler
When to use Brozzler
In most cases, we recommend using Brozzler only after you've tried a crawl using the Standard crawling technology (Heritrix/Umbra). If it seems like dynamic elements were not captured in your Standard crawl or you're seeing a number of error pages when viewing results in Wayback, running a test crawl using Brozzler is a good next step.
Brozzler is recommended when crawling:
Differences between Brozzler and Standard crawls:
Please keep in mind the following differences between Standard and Brozzler when choosing the best crawling technology to use.
- Brozzler crawls will not de-duplicate content against Standard crawls in your collections.
- Brozzler crawls de-duplicate content at the seed level, not at the collection level like Standard.
- Brozzler is not yet configured for PDF-only crawls.
- Brozzler in not yet configured for scheduled/recurring crawls
How to use Brozzler
Currently Brozzler will need to be enabled in your account for you to use it. To have it enabled please submit your request in a support ticket.
Running Brozzler Crawls
You will see the option to choose between Brozzler and our “Standard” (Heritrix/Umbra) crawling technology in a new field within the “Run Crawl” dialog called “Crawling Technology”.
Run a Brozzler test crawl before deciding to use Brozzler for a production crawl.
Reviewing Brozzler Crawls
Brozzler crawls return the same post-crawl reports as Standard crawls and can be differentiated from Standard crawls by the Brozzler icon. It will be listed next to the Crawl ID in the Crawl Report list and on individual crawl reports for all Brozzler crawls.
The crawling technology used in a crawl (Brozzler or Standard) is also indicated on the Overview tab of each crawl report.