I have more than 250 seeds in the same collection that need to be captured with a similar frequency (quarterly, in my case). However, scheduling a crawl with such a large number of seeds has several drawbacks:
- It is difficult to carry out quality control.
- Too many seeds can lead to the crawl never being completed.
- The capture of the content of the different seeds is irregular. Some of the seeds have the seed status "Not crawled (queued)" and others collect a huge amount of data.
I have seen that there are some posts mentioning this issue, but as at the moment there is no possibility to schedule more than one crawl for a given frequency I wonder if anyone can share best practices, workarounds or tips to solve this problem.
I can only think of dividing the seeds in different collections and schedule periodical crawls at the same time or set a different frequency for the crawls of the different seeds, so that some of them are captured quarterly and others with a different frequency, or even manually.
Please sign in to leave a comment.