On this page:
When to use Brozzler
If dynamic elements on a page were not captured in a Standard crawl or you're seeing a number of error pages when viewing results in Wayback, running a test crawl using Brozzler is a good next step.
Brozzler is recommended when crawling any social media, including:
- Soundcloud (or sites with Soundcloud embeds)
- YouTube (or sites with YouTube embeds)
Brozzler is recommended for the following platforms and issues:
- Wix
- Sidearm Sports
- Seeds where multiple redirects prohibit replay in Wayback
Differences between Brozzler and Standard crawls:
Please keep in mind that Brozzler and Standard crawls use different capture mechanisms so there may be differences in the amount of data each crawler can capture from the same Seed. When using a new crawling technology, please run a test crawl first. Please keep in mind the following:
- Brozzler is not yet configured for PDF-only crawls.
- Brozzler is globally available for One-Time or Test crawls. If you would like to use it for Recurring crawls please submit a support ticket to have the feature enabled.
How to use Brozzler
Running Brozzler Crawls
You will see the option to choose between Brozzler and our “Standard” (Heritrix/Umbra) crawling technology in a new field within the “Run Crawl” dialog called “Crawling Technology”.
Run a Brozzler test crawl before deciding to use Brozzler for a production crawl.
Reviewing Brozzler Crawls
Brozzler crawls return the same post-crawl reports as Standard crawls and can be differentiated from Standard crawls by the Brozzler icon. It will be listed next to the Crawl ID in the Crawl Report list and on individual crawl reports for all Brozzler crawls.
The crawling technology used in a crawl (Brozzler or Standard) is also indicated on the Overview tab of each crawl report.
Using Brozzler For Scheduled Crawls
If you would like to use Brozzler on scheduled crawls but don't see it as an option in your account, please request it by submitting a support ticket.
All seeds scheduled at a given frequency in a collection must be crawled with the same crawling technology. It is not possible to enable Brozzler for select seeds only within a scheduled crawl.
To use Brozzler for a scheduled crawl:
- Go to the Collection where your scheduled crawls are located.
- Click on the Crawls tab in the middle navigation bar.
- On the right-hand side, select Scheduled Crawls.
- Click Edit Limits to make the desired changes.
In the Edit Crawl Limits dialog box, select the radio button next to Brozzler. Clicking Modify Limits will save your choice.
Comments
0 comments
Please sign in to leave a comment.