archive-it.org, the public access point to our collections, has a robots.txt restriction on the /collections/ directory, meaning I think that our collection pages (where all our seed metadata is displayed) are not indexed by search engines. I can see positive and negative aspects of this, and would be curious to hear from both AIT staff and partners their thoughts on whether our seed level metadata could/should be opened up to search engines, either globally or via partner opt-in (i.e remove restriction on just collection(s) #---). To be clear, I'm not talking about the robots.txt restriction on wayback.archive-it.org (the archived content itself).
If our metadata about our archived sites showed up in search engine results we would certainly get more traffic to our collections. We might also get inquiries from site owners unhappy that we're possibly reducing their live-site traffic, and takedown requests, etc.I can see how partners doing government or institutional archiving (i.e. where their institution is both the "site owner" and the archives) might be less keen on having outdated versions surface in search engine results. I think live websites would usually have much higher search rankings though, so this may not be a real issue in practice.
If you could open up your archive-it collection metadata to search engine indexing, would you?
Please sign in to leave a comment.