Umbra
4.9 introduces a new crawler architecture called “Umbra”. This is an under the hood improvement to Archive-It used to crawl select seeds and does not impact the user experience.
Umbra works in conjunction with the Heritrix crawler and improves the capture of dynamic web content, most commonly seen in social networking sites like Facebook that utilize “client-side scripting”, which can be archive-unfriendly. Increased functionality and improvements using this new architecture are forthcoming as we continue development towards our 5.0 release.
As of today, the following sites when crawled as seed URLs will be archived using the new Umbra architecture:
- facebook.com
- flickr.com
- vimeo.com
The first noticeable improvements that Archive-It partners may notice is the capture of videos on vimeo.com, and the capture and playback of scrolling content on Facebook.
To learn more about the social media sites that utilize the Umbra architecture, including Vimeo.com, please see Archiving Social Media Sites.
To learn more about how Umbra works, please see Introduction to Umbra.
Comments
0 comments
Please sign in to leave a comment.