John Rees

  • Total activity 5
  • Last activity
  • Member since
  • Following 0 users
  • Followed by 0 users
  • Votes 0
  • Subscriptions 2

Activity overview

Latest activity by John Rees
  • Avatar

    John Rees created a post,

    Access to URL via Archive-It redirect

    Archive-It URLs like https://wayback.archive-it.org/org-350/https://www.nlm.nih.gov/exhibition/cesarean/index.html nicely redirect the user to the last crawl for a page, but sometimes this action f...

  • Avatar

    John Rees commented,

    Thanks for the response! Seems I mis-characterized my issue. What I'm after is identifying documents/data to prevent duping. For example: Using the search function, I know https://www.nlm.nih.gov/...

  • Avatar

    John Rees created a post,

    De-duping seeds

    I'm trying to find a method for identifying all the existing URLs across several collections to avoid crawling them multiple times as individual seeds. The 3.6 release notes (I don't see it in any ...