On this page:
When to use Brozzler
If dynamic elements on a page were not captured in a Standard crawl or you're seeing a number of error pages when viewing results in Wayback, running a test crawl using Brozzler is a good next step.
Brozzler is recommended when crawling any social media, including:
- Soundcloud (or sites with Soundcloud embeds)
- YouTube (or sites with YouTube embeds)
Brozzler is recommended for the following platforms and issues:
- Wix
- Sidearm Sports
- Seeds where multiple redirects prohibit replay in Wayback
Differences between Brozzler and Standard crawls:
Please keep in mind that Brozzler and Standard crawls use different capture mechanisms so there may be differences in the amount of data each crawler can capture from the same Seed. When using a new crawling technology, please run a test crawl first. Please keep in mind the following:
- Brozzler is not yet configured for PDF-only crawls.
- Brozzler is globally available for One-Time or Test crawls. If you would like to use it for Recurring crawls please submit a support ticket to have the feature enabled.
How to use Brozzler
Running Brozzler Crawls
You will see the option to choose between Brozzler and our “Standard” (Heritrix/Umbra) crawling technology in a new field within the “Run Crawl” dialog called “Crawling Technology”.
Run a Brozzler test crawl before deciding to use Brozzler for a production crawl.
Reviewing Brozzler Crawls
Brozzler crawls return the same post-crawl reports as Standard crawls and can be differentiated from Standard crawls by the Brozzler icon. It will be listed next to the Crawl ID in the Crawl Report list and on individual crawl reports for all Brozzler crawls.
The crawling technology used in a crawl (Brozzler or Standard) is also indicated on the Overview tab of each crawl report.
Enabling Brozzler For Recurring Crawls
Brozzler can be enabled as an option for recurring crawls. If you would like to use Brozzler on recurring crawls please request it by submitting a support ticket.
All seeds scheduled at a given frequency in a collection must be crawled with the same crawling technology. It is not possible to enable Brozzler for select seeds only within a scheduled crawl.
Once you have Brozzler enabled for recurring crawls, navigate to the collection in which you have scheduled crawls, choose the “Crawls” tab from the middle navigation bar, then select “Scheduled Crawls” on the far right. After that, select “Edit Limits”.
This selection will give you a dialog box where you can select the “Crawling Technology” near the bottom with a radio button.
Comments
0 comments
Please sign in to leave a comment.