Webpages at one URL - PDF linked docs on a different URL: how to capture both?

Comments

2 comments

  • Avatar
    Marissa Tartaglia

    Hi Kari, 

    If your crawl isn't picking up those pdfs and images, what I usually do is click on the seed you've added to your account, go to the "Seed Scope" tab  at the seed level, and then add the url (www.mit.edu/groups/hr/) that hosts the pdfs as an "expand scope to include url if" rule. Definitely run a test crawl with that to make sure you are picking up what you need and not a ton of extra files or images. I would also click on the pdfs and see what the url is for those (I tried clicking on your links but you must have to be logged in). Sometimes I also have to add the first part of the url for the pdfs or images as an additional "expand scope to include url if" rule in order to capture all of them.  

    Hope that gives you some ideas!

    Marissa 

    Wisconsin Historical Society

     

     

     

     

  • Avatar
    Kari Smith

    Thank you, Marissa and Karl for your responses.  i was successful in capturing the PDFs that were on a different seed and integrate them for Public access. I really appreciate your help!

    Kari

    Digital Archivist at MIT Institute Archives and Special Collections

Please sign in to leave a comment.