Seed scoping: avoid seed to be discovered and crawled via another seed

Evelyne

January 07, 2026 09:20

In the documentation pages we can read : "Scope specifics - Standard crawls

The Standard crawling technology uses all seed URLs in a given crawl to determine that crawl's scope. This means, if seeds include links to one another, it's possible for content from one seed to be discovered and crawled via another seed. "

Is there a way to avoid this behavior other than to launch an individual crawl for each seed?

Comments

4 comments

Skip Kendall January 07, 2026 12:18

The only way that I can think that might work would be to create a scoping rule in each that excludes the others. That would be a fair amount of work, though.

It may not be a problem, though. De-duplication should prevent the same pages from being captured via different paths, meaning you should only get one copy of each file, even if it can be reached from different places.

0

Comment actions Permalink
Evelyne January 07, 2026 13:56

From what I understand deduplication is only working within the same seed so in this case deduplication is not working well, and also the seed scope rules are taken into account are the ones of the original seed and not the ones of the seed the pages belong to, this is problematic as well.

But indeed with your solution those problems would be solved. Then we would need to create in each seed a rule excluding everything except the seed we want to capture right? It is worth testing is no easier option exist.

Thank you

0

Comment actions Permalink
Skip Kendall January 07, 2026 14:08

Ah, yes, I forgot about that.

Yeah, that's what I was thinking about with the rule. If you were crawling 4 seeds together, each would need rules to exclude the other 3.

0

Comment actions Permalink
Evelyne January 07, 2026 14:11

yes but as we have over 300 seeds to crawl together ... so it scales in a more complex way ;)

0

Comment actions Permalink

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?