Institutions often only want to archive the PDF files that are part of a site. With our "PDF only crawl" feature, partners can crawl entire sites, but ONLY the PDF files will actually be archived and counted toward your account's data budget.
To use this feature, begin by navigating to the list in the "Crawls" tab for your chosen collection. For any crawl that you wish to limit to PDFs only, click on its corresponding "Edit Limits" button:
Then, in the dialog box that this button opens, click the check-box next to the "PDF only crawl" option:
End by clicking the "Modify Limits button," and, when this crawl next runs, it will only archive PDFs.