We would like to collect all of the PDF files located here: https://www.gov.nl.ca/ecc/publications/annual-reports/
However, when we run the seed, we get a "This page has not been archived here" page as a result. It has been more than 24 hours since we have tried.
It tuns out that the PDFs appearing on this page have URLs indicating a different file structure. For example, https://www.gov.nl.ca/ecc/files/ECCMAnnualReport2020-21.pdf
Can anyone explain what this problem is and how we might successfully harvest the PDFs?
Please sign in to leave a comment.