Hi AIT community!
My university has collected about 800+ university websites using a single seed with an expand scope rule that allows us to capture that seed's many subdomains. Specifically, our seed is https://www.gwu.edu and our expand scope rule is the SURT +https://(edu,gwu,
This combo allows us to collect all of gwu.edu and 800+ subdomains, such as magazine.gwu.edu, bulletin.gwu.edu, etc. You get the picture. The gwu.edu seed appears in the collection's public portal list of Sites, and through browsing that site in the wayback viewer, users can make their way to the 800+ subdomains.
After crawling, we added about 10 of the most important subdomains as seeds. We haven't specifically crawled these seeds, and we don't intend to. We added them as seeds in order to make them show up on the list of "Sites" that are a part of the collection (you can see these 10 subdomains listed at https://archive-it.org/collections/5184) It's a little bit of a hack that seems to be working very well for us!
However, we decided that we didn't need to add all 10 sub-domains, and we'd like to remove one of them (https://it.gwu.edu/) from the list. We still want users to be able to access the archived site, but we're not sure this subdomain is important enough to be listed on https://archive-it.org/collections/5184.
Finally, to the question:
If we delete the https://it.gwu.edu/ from this collection's seeds, will that delete or make inaccessible the underlying data, or will it simply remove https://it.gwu.edu/ from the list of sites on our public collection page?
Please sign in to leave a comment.