Email reminders are now available to let partners know 24 hours before a crawl is scheduled to begin, as well as when a crawl has finished. To set up the email reminders, go to settings (on the bottom navigation bar) and enter your name, email address, and time zone. You also must click the "Enable Email Notifications" box. If you do not specify a time zone, your reminders will be reported in GMT. If you would prefer not to receive email reminders, do not enter your email address on the settings page.
Figure 1: Settings to update for email reminders
The reminders are:
*A notification email will arrive 24 hours before your scheduled crawl begins (with the exception of crawls you start manually). This email will also inform you if any of your seeds have become unreachable since the previous crawl.
*A notification email that your crawl has finished will be sent. This email will also inform you if a robots.txt file has completely blocked any of your seeds from being crawled.
Reset Archive-This! Link
You will need to reset your Archive-This! link in order for it to work with the updated application. To do this, delete the Archive-This! icon or link currently on your browser toolbar or in your bookmarks. Then from the new application window, re-drag or re-bookmark the Archive-This! link. Once Archive-This! is reset for the updated application, it should work exactly as it did previously.
User Interface Improvements
In response to partner feedback, we have made a number of changes to the Archive-It user interface to make collection and seed management more intuitive. The most significant changes include the following.
We changed language throughout the application so that "enabled" and "disabled" (in reference to the status of a collection or a seed) are now active and inactive. The functionality remains the same. As follows, you may now activate or deactivate a collection (see right-hand links above) rather than "enable" or "disable" one. Likewise, you may now activate or deactivate a seed.
Figure 2: Active collections view
Our enhanced Partner Home page now includes the ability to navigate directly to all active collections and the most recent crawl reports. There is also a direct link to create a new collection. Like the old account home page, you can easily track the status and budget of your account here.
Figure 3: Partner Home page
Each collection now has its own Collection Management page that provides a more direct interface to collection controls as well as the ability to manually start your crawls.
*Collection level controls such as adding and editing seeds are available on the Collection Management page
*Crawl settings are now available under the modify crawl scope link.
*You can now immediately initiate a crawl from the Collection Management page by clicking start crawl now >> in the lower right hand corner of the page. All test crawls must be started in this way.
*You can access seeds by clicking on the seed management options on the right side of the screen. You can filter which seeds you view by clicking on the different options. The bulk edit seed function remains the same.
Figure 4: Collection Management page
The updated Seed Management page (formerly known as seed list view) now features a more streamlined layout. You can easily click to view seeds by status or by frequency. Seeds can be unscheduled for crawling by clicking the deactivate link.
Figure 5: Seed Management page
The menu under collections on the upper navigation bar has changed to allow quick access to views of active or inactive collections.
Figure 6: New items under Collections menu
Two new crawl frequency options are now available:
*One-time - This frequency allows you to crawl a seed (or group of seeds) only once without scheduling any future crawl. The crawl lasts for 72 hours.
*Semiannual - This frequency allows you to crawl a seed (or group of seeds) at six-month intervals. The crawl lasts for 72 hours.
Robots.txt and queue information are now available in the hosts report for all crawl frequencies. This robots.txt feature makes it easy to discover if parts of your seed (but not the entire site) are blocked from crawling by robots.txt exclusion. The queue information lets you know how many and what documents were discovered but not crawled due to time limits.
You can use this information to better inform your next crawl.
Figure 7: Robots exclusion in Hosts report
*Search by keyword through NutchWAX as well as search by URL via the Wayback Machine is now available under the Access menu on the top navigation bar. Now you can browse your collection either by URL or click the "Search" tab and search your collection by keyword.
Figure 8: Wayback Machine now under "Access"
*Advance search within the application has been improved to include more search options. These include search by keyword, keyword phrase, excluding a keyword, specific host, number of documents per host, file format, and date.
Figure 9: Advance Search under "Access"