Omeka is a popular platform for hosting collection-based websites, especially among partners in libraries and archives. This guide provides an overview of how to properly format, scope, and crawl Omeka sites.
There are no known issues with archiving Omeka sites.
You can find a full list of known issues for archiving various platforms on our Status of monitored platforms page.
On this page
How to scope Omeka seeds
In general, our crawling technology can reliably archive these sites without special scoping modifications. However, each Omeka site can be unique, so we recommend running a test crawl and reviewing your results before permanently archiving them.
We also recommend adding an Ignore Robots.txt rule.
As with other seeds in Archive-It collections, an Omeka site might block crawling technology from accessing part or all of its contents. For instance, Omeka site templates sometimes block crawlers from the /files/ directory that contains downloadable items and thumbnail images.
For information on how to do this, see Archive-It’s guide to avoiding robots exclusions.
Like sites made with Drupal or other content management systems (CMS), Omeka sites can create “crawler traps,” which may endlessly generate new documents for the crawlers to capture. See Archive-It’s directions for identifying and avoiding crawler traps.