
Alex Thurman
- Total activity 57
- Last activity
- Member since
- Following 0 users
- Followed by 0 users
- Votes 21
- Subscriptions 28
Activity overview
Latest activity by Alex Thurman-
Alex Thurman commented,
Hi Silvia In our Human Rights collection we have over 700 seeds that for years have been part of a Quarterly scheduled crawl, but like you and Stefana I have noticed increasingly unsatisfactory res...
-
Alex Thurman commented,
Hi Darren. What I do in this situation is: --Go to Metadata/Bulk Seed Metadata and "Download Existing Seed Metadata". Save it unchanged so you have it as a backup. --"Save as" another version of th...
-
Alex Thurman commented,
Our Waybackfill was very broad and simple: all URLs from any columbia.edu subdomain present in the global Wayback Machine for 1996 to June 2010. Since we started archiving the columbia.edu domain ...
-
Alex Thurman commented,
Hi Sarah At Columbia we implemented a Waybackfill last year. We mentioned it in a Libraries blog post: https://blogs.cul.columbia.edu/rbml/2022/01/19/now-available-columbia-university-web-archives-...
-
Alex Thurman created a post,
Archives Unleashed : Call for Participation (Columbia University, March 26-27, 2020)
_________ Archives Unleashed: Call for Participation Web Data at Scale with the Archives Unleashed Toolkit Butler Library | Columbia University, New York City 26-27 March 2020 http://archivesunlea...
-
Alex Thurman commented,
Most wix seeds seem not to have wix.com or wixsite.com in their URL, so the scoping rules are not added by default and have to be added manually, which is very time consuming, especially given the ...
-
Alex Thurman created a post,
2019 IIPC Web Archiving Conference call for papers
Hello fellow Archive-It partners, I encourage all of you to consider submitting a proposal for the 2019 IIPC Web Archiving Conference to be held June 6-7 2019 in Zagreb, Croatia. Cfp i...
-
Alex Thurman created a post,
exposing our seed-level metadata to search engines (archive-it.org/collections/ & robots.txt)
Hi all archive-it.org, the public access point to our collections, has a robots.txt restriction on the /collections/ directory, meaning I think that our collection pages (where all our seed metadat...