exposing our seed-level metadata to search engines (archive-it.org/collections/ & robots.txt)
Hi all
archive-it.org, the public access point to our collections, has a robots.txt restriction on the /collections/ directory, meaning I think that our collection pages (where all our seed metadata is displayed) are not indexed by search engines. I can see positive and negative aspects of this, and would be curious to hear from both AIT staff and partners their thoughts on whether our seed level metadata could/should be opened up to search engines, either globally or via partner opt-in (i.e remove restriction on just collection(s) #---). To be clear, I'm not talking about the robots.txt restriction on wayback.archive-it.org (the archived content itself).
If our metadata about our archived sites showed up in search engine results we would certainly get more traffic to our collections. We might also get inquiries from site owners unhappy that we're possibly reducing their live-site traffic, and takedown requests, etc.I can see how partners doing government or institutional archiving (i.e. where their institution is both the "site owner" and the archives) might be less keen on having outdated versions surface in search engine results. I think live websites would usually have much higher search rankings though, so this may not be a real issue in practice.
If you could open up your archive-it collection metadata to search engine indexing, would you?
--Alex
-
All our collections are set to private since our main audience are our internal users within Nationwide. Our main objective is to serve their requests not the public.
I would say think about your objective for saving your sites. Is your main priority right now to drive traffic or is it to preserve? It's okay to prioritize and rank your objectives. Once you figure that out you'll have your answer on what you should do about metadata.
Please sign in to leave a comment.
Comments
1 comment