Crawling eLibrarys

Comments

1 comment

  • Official comment
    Avatar
    Tanya Ulmer

    eLibrarys is a challenging platform for web archiving due to the document-tree structure it employs and POST requests in the links. 

    Sometimes multiple crawls with some additional seeds can help collect different parts of the site. The Standard crawler can often help collect the documents (often PDFs). And Brozzler can help with some POST requests, but often these can't replay in Wayback. 

    The HTTP 403 (Forbidden) error is coming from the website owners' servers (see this list of Error Codes).
     
    If you submit a support ticket, we can provide more specific strategies for collecting this instance of eLibrarys or information website owners may need for their Allow Lists. 

    Comment actions Permalink

Please sign in to leave a comment.