Password-protected WordPress site

Comments

1 comment

  • Official comment
    Avatar
    Tanya Ulmer

    Collecting and replaying password-protected content can be challenging, due to a variety of authentication measures on various platforms. Each website's case can be different; for this reason, I recommend submitting a support ticket with the crawl details.

    It sounds like your crawl may have been successful in collecting some of the intranet pages and this may be more of a replay issue. We can verify this if you submit a support ticket, and we might even be able to help improve replay.

    If it is more of an issue of collecting the intranet pages, in addition to the general advice on Archiving password protected sites and Troubleshooting password-protected sites, it can sometimes help to add an additional seed just for the login page that points to the protected page where you'd like to start collecting.

    The webpage for the additional login seed should:

    • Have both fields for the login credentials (username and password) on the same page.
    • Not have only the username field with the password field on a second page.
    • Not have additional fields.
    • Not have any kind of CAPTCHA.
    • Not require 2-factor authentication.

    Then try crawling the main seed together with the additional login seed to see if it can collect more of the password-protected pages. Hope this information helps!

    Comment actions Permalink

Please sign in to leave a comment.