Administration
-From the admin page, account administrators can turn on and off the Archive-it Pro advanced crawl control capabilities if they are Archive-It Pro subscribers. All other admin functions are the same as in the 1.5 feature release.
Documentation
-We added multiple links to the completed crawls pages in the application UI, so reports are now easier to find.
-The following documents have been added to the Archive-It help wiki:
-How To FAQ: misc. questions and information for application use
-Test Crawls: more complete information about how to use the test crawl features
-Scope Expansion: a "how to" on expanding crawl scope
-Archive-It Pro: a "how to" on using the new Archive-It Pro crawl and host constraint controls
Collections
-Using the bulk seed editor, partners can now apply the same metadata and crawl frequency edits to a partial selection or to all seeds. From the seed list view click on bulk edit seeds. A wizard will walk you through selecting the seeds you want to edit and making the changes you want to make. When making metadata changes you will need to click the box next to the value you want to edit as well as entering the text you want to add.
When you make changes to your seeds using the bulk seed editor these changes override any changes that were already in the system for the field you are editing.
-The subject metadata field will now remember subjects that you have previously added and try to automatically complete what you type. When you are adding subjects be sure to click 'add' after you enter your subjects. To remove subjects, click 'remove' next to the subject you want removed. When finished with your subject changes be sure to click "save" on the edit page so your changes are kept (just when you are using the 'edit' link, not the bulk seed editor).
The bulk edit feature allows you to add subject metadata to several seeds or all your seeds at once, however you cannot remove the subjects in bulk.
Crawls
-In addition to daily, weekly, monthly and quarterly crawl frequencies we now have test crawl frequency. Use the test crawl to see how many documents and what hosts your seeds will collect. Test crawls generate all normal reports, but do not actually archive any documents so nothing crawled counts against your total crawl budget. You can test all or some of your seeds, but only one test will run at a time (meaning all test seeds will be fired off at the same time when you manually start the test crawl).
Test crawls can run for as long as 3 days, and explore up to 1,000,000 documents. Assign seeds to test mode as you would other crawl frequencies, but the test crawl must be manually started (go to crawls ? manually start crawlers). There is more information about test crawls on the Archive-It help wiki.
-There is a new scope expansion tool you can use in Archive-It. This will allow you to automatically crawl all sub-domains of your seeds. To use this tool, you will need to know about scope expansion rules or SURTs. Apply scope expansion rules on the general seed edit page (the edit link next to your seeds on the seed list view page). There is information on how to do this on the help wiki and it will be covered in the 2.0 training.
Access
-Search results inside the application now return hits both from the archived pages and seed metadata that you have assigned to your seeds. Just enter a search term as you normally would, select a collection to search and click search. Your results will be displayed in two tabs, one for hits from the archived pages and one for hits from your seed metadata. The metadata hits have links to the archived page as well as to the seed metadata editing page.
Archive-It Pro
The Archive-It team has added some new features you can use to fine-tune your crawls; they are collectively called Archive-It Pro. Archive-It Pro is an optional addition to the general Archive-It subscription. All Archive-It Pro features can be accessed from the crawl settings link on the seed list view.
The new features include:
-Crawl constraints: limit the number of documents you want archived in an entire crawl instance.
-Host constraints: allows you to block hosts from crawling by name or regular expression. You will also be able to set limits for the number of documents that come from a specific host.
More detailed information about Archive-It Pro is available on the help wiki.
Comments
0 comments
Please sign in to leave a comment.