Archiving Internet Archive (archive.org) audio and video

Overview

Partners can archive streaming audio and video files from the Internet Archive (archive.org) that are embedded in the pages that they collect. This guide provides an overview of how to properly format, scope, and crawl audio and video files from the Internet Archive (archive.org).

Known issues

Currently, there are no known issues with archiving audio and video files from the Internet Archive (archive.org) that are embedded in the pages that partners collect.

You can find a full list of known issues for archiving various platforms on our Status of monitored platforms page.

On this page:

How to scope your Internet Archive (archive.org) audio and video seeds

How to scope your Internet Archive (archive.org) audio and video seeds

In order to capture these archive.org embeds, it is necessary to apply the following crawl scope modifications:

Ignore robots.txt (either entirely at the seed level or on the host archive.org at the collection level)
Expand scope to include URL if it matches the regular expression:
^(https?:)?\/\/[a-z0-9.-]*archive.org\/(.*\/|)(items|download|includes)\/.*$

With these rules in place, audio and video on captured pages should play normally.

Articles in this section

Overview

Known issues

How to scope your Internet Archive (archive.org) audio and video seeds

Comments

Articles in this section

Overview

Known issues

How to scope your Internet Archive (archive.org) audio and video seeds

Related articles