2018 PM in DC - Breakout Group 4: Describing web archives



1 comment

  • Avatar
    Mary Haberle (Edited )

    Audit of established web archiving program can be an impetus for reconsidering metadata best practices.

    Common challenges:

    • How to pull data out of other systems and integrate to web archives and vice versa.
    • How to link description of URLs that have changed over time so that users can see all the captures across seeds.
    • Historical citation service available through AIT for a fee if you want to integrate Wayback Machine content into your AIT account.
    • How to connect catalog records for web archives to other systems is a common concern.

    Extracting and describing individual resource types, like PDFs, that were captured via web archiving and describing them in another access system How to select resources, is it based on usage? How much description should we apply to these resources. What comes first description or usage - chicken and egg situation.

    How do you measure resource requests? Some Google Analytics, main source is set up by IT called Angel Fish. How granular (collection, page, etc.) Page level is simple, but not very detailed. How do you turn analytic information into actionable information is a question that needs to be answered. Promotion of collections may be one strategy. Referral sources can be interesting to look at as a data point. Does it go back to where our descriptions are available and how well described they are? More description doesn't always equal more usage...it depends (always). Flexibility and freedom can make it a challenge to figure out what direction to go in when describing content captured.

    Communication with users - LoC describes at the "web archive" level. Entity model one entity can have multiple seeds (arranged conceptually and by permissions). MODS records are facet type entry points for the collections. Challenge of explaining to users the way that your description records correspond to number of documents captured. Helpful analogy is that a book is described at the item level; there are many pages in book, not described at page level

    Collaboration challenges - How to describe a search interface for all the content that collected by multiple consortium members? Crosswalking. Variable levels of time maybe better resourced participants can develop an interface and it can be modified by smaller repositories.

    Legacy issues - in the past users could only have one collection. Some partners need to deal with the legacy of an evolution from seeds collected in the original single collection ld entries may have better metadata which surfaces them higher, need to connect new version of that seed (in new collection) to the older seed.

    Need for cross departmental/position collaboration and support that will give web archivists access to catalog record creation. Making friends with IT changes everything. Sandbox space is really important in this work to see what you have and what is possible with what you have. Sometimes archivists use their own computers if their organization can't supply them a machine with full admin rights.

    Other systems for web archives description? MODS is good, it's hierarchical unlike Dublin Core. Some institutions migrating to ALMA system, but unsure how good for web archives, waiting to see how it plays out.

    Future move away from LCSH assigned by cataloguing librarians to computer driven topic modelling that gets meshed with LCSH. Minimal description at LoC doesn't include LCSH due to scale.

    Relevant feature requests to this topic:

    • Links in descriptive metadata to reference Wayback captures.
    • Ability to connect seeds to each other when seeds change over time.
    Comment actions Permalink

Please sign in to leave a comment.