Scoping and crawling for newsletter produced in Constant Contact

I'm trying to collect newsletters produced in Constant Contact, but shared through a campus webpage (https://evc.ucsd.edu/). The crawl captures the home page where the newsletter is hosted and the initial level of the individual newsletter. From there, the newsletter links out to some resources which have been captured during crawl of different campus sites.

I would like to set up a separate crawl for just the newsletter and its content so it can be cataloged as a publication. I would also like to be able to crawl the additional resources that the newsletter links to.

Does anyone have experience with scoping and crawling Constant Contact effectively?

Thanks in advance for any insights, tips, tricks, etc. you can share.

Karl Blumenthal July 09, 2020 17:13 (Edited July 09, 2020 17:13)

Hi Marlayna,

This sounds like a good opportunity to use the "One Page+" seed type. In fact we used to call this the "News/RSS Feed" seed type because it was made for precisely these situations--to archive a page and its links one "hop" away onto numerous different news sites, so that you don't have to pursue all those different possible links by way of manual scoping.

That approach should archive a typical newsletter like this one and follow all the links I see within, but let us know how it goes for you!

Scoping and crawling for newsletter produced in Constant Contact

Comments

Didn't find what you were looking for?