Karl Blumenthal
- Total activity 445
- Last activity
- Member since
- Following 0 users
- Followed by 0 users
- Votes 3
- Subscriptions 196
Comments
Recent activity by Karl Blumenthal Sort by recent activity-
Hi Brigette! Have you tried collecting these seeds with Brozzler yet? I gave one of them a quick one-page test today and it appeared to capture the site without all that extra session-specific redi...
-
Hi Alison! Have you given this seed a go with Brozzler yet? That's always the first tool I reach for when I run into a site that limits bot-like traffic like this one appears to. Let us know if tha...
-
In case you missed us: here are the slides from today's call, including our agenda and links out to all of the resources mentioned.
-
Hi Mary, Your collections will in fact remain accessible via Archive-It and the Internet Archive's Wayback Machine if and when you ever need to discontinue service. You would however lose the abili...
-
Hi Adriane, A couple of options to start with: There are a number of different browser extensions out there that perform the kind of bulk downloading that you describe, but I've used DownloadThemA...
-
Hi Amanda, Sorry for the delay! In case you haven't tried this already, we recommend toggling-on the Brozzler tool to collect YouTube seeds completely. If that doesn't do the trick for you, please ...
-
Here are the slides for today's call. Follow along from home and follow the links for more information about the updates from Jefferson Bailey, Director of Web & Data Services at the Internet Archive.
-
Update: As Lori suggested, we have since adjusted "standard" crawl deduplication to align with the Brozzler model. Since mid-January of this year, Heritrix-based crawl jobs also deduplicate at the ...
-
Hi Paula! Please submit a support ticket and we'd be happy to take a closer look into your crawls. We may be able to help with a fix from here on our side of the Wayback replay software, to display...
-
Here is a summary of the major themes and ideas from our recent open call about curating COVID-19 web archive collections: Crawling COVID-19: What (and how) web archivists collect. Please don't hes...