On this page:
When to use Brozzler
If dynamic elements on a page were not captured in a Standard crawl or you're seeing a number of error pages when viewing results in Wayback, running a test crawl using Brozzler is a good next step.
Brozzler is recommended when crawling any social media, including:
- Soundcloud (or sites with Soundcloud embeds)
- YouTube (or sites with YouTube embeds)
Brozzler is recommended for the following platforms and issues:
- Wix
- Sidearm Sports
- Seeds where multiple redirects prohibit replay in Wayback
Differences between Brozzler and Standard crawls
Brozzler and Standard crawls rely on different capture methods, so they may collect different amounts or types of data from the same seed. When you're trying a new crawling technology, it’s best to start with a test crawl to see how well it captures and replays the content.
How to use Brozzler
Running Brozzler Crawls
In the Run Crawl dialog, you'll see a Crawling Technology field, where you can choose between Brozzler and Standard (Heritrix/Umbra).
Before selecting Brozzler for a production crawl, run a Brozzler test crawl to confirm how well it captures and replays your seed and to determine whether any scoping adjustments are needed.
Reviewing Brozzler Crawls
Brozzler crawls generate the same post‑crawl reports as Standard crawls, but you can identify them by the Brozzler icon. The icon appears next to the Crawl ID in the Crawl Report list and on the individual crawl report for every Brozzler run, making it easy to distinguish them from Standard crawls at a glance.
The crawling technology used in a crawl (Brozzler or Standard) is also indicated on the Overview tab of each crawl report.
Using Brozzler For Scheduled Crawls
All seeds scheduled at a given frequency in a collection must be crawled with the same crawling technology. It is not possible to enable Brozzler for select seeds only within a scheduled crawl.
To use Brozzler for a scheduled crawl:
- Go to the Collection where your scheduled crawls are located.
- Click the Crawls tab.
- Select Scheduled Crawls.
- Click Edit Limits.
In the Edit Crawl Limits dialog, select the Brozzler radio button. Click Modify Limits to save.
Comments
0 comments
Please sign in to leave a comment.