Public access to PDF only crawls
How can searches on the public end of Archive-It view the pdfs crawled in a pdf only crawl?
-
Official comment
Hi, Abigail. While PDF-only crawls do not collect the HTML of the seed page, the PDFs collected in saved test or production crawls are indexed and returned in search results. This means that if the seed from your PDF-only crawl is set to 'visible to public', the seed itself won't have any Wayback captures, but if (for instance) you search the page text by a phrase included in the PDF, the PDF will be returned in search results. You can click the result to view the PDF.
If you want to limit your search results to PDFs only, go to Advanced Search > File format > PDF. In the sample search below, the PDF results were all collected from a one-time PDF-only crawl on July 30, 2024.
Comment actions
Please sign in to leave a comment.
Comments
1 comment