Best Practices for Uploaded WARC Description
As many of us have been experiencing, certain websites, particularly social media, have been producing unsatisfactory crawls. I have been running some crawls using other tools (Webrecorder/Perma.cc) in an attempt to get better results, and then uploading these WARCs to Archive-It in order to keep our collections together under one host. However, these uploaded WARCs are a little disappointing because there is no "report" or anyway to pull/add information/metadata to these files. I'd like there to be a way to differentiate these captures, but am struggling on best practices here. My current thoughts are to put a note in the description with the tool used and date of capture (I'd put it in the Date field, but AI aggregates Date entries to the metadata search tools on the public side so this would get messy). Does anyone upload WARCs frequently? How do you handle these files?
Please sign in to leave a comment.
Comments
0 comments