If you go to https://archive-it.org/collections/5785 and check out the collection of The Aggie student newspapers from the UC Davis archives, starting May 3, 2015 (sporadically) and May 19, 2016 (every subsequent week), crawls show up twice-- same exact date and time. The sporadic ones have slightly different crawl times-- up to an hour or so, for the most part.
When we go to the backend, we can't find any reason why this would be-- there's only one seed, and the crawling history dates don't correspond to the dates of the collection. However, most of the crawls with duplicates have a status of "Finished: Time Limit" rather than just "Finished." I couldn't find anything unique about the few that actually "finished," as their new capture data amount vs. the total data captured varies. They're all weekly, with the only crawl limit being the time limit of 3 days, with a bunch of collection rules, but no seed rules.
Why would these be showing up twice, only in one location?
Please sign in to leave a comment.