Crawling with a Custom User Agent

Overview

This article provides an overview of custom user agents, examples of why you might use one, and information on how to use one.

On this page:

What is a user agent?
Why use a custom user agent?
How to use a custom user agent
Related content

What is a user agent?

A user agent identifies an application, like a crawler, to the server or site it's accessing. The Archive-It crawlers, for example, generally include archive.org_bot or Archive-It in their user agent string. This lets a site know that it's being crawled by an Archive-It partner. Websites also use the user agent to determine what kind of information to serve. For example, if a site knows it's being accessed via a mobile browser, it may return a mobile specific version.

Why use a custom user agent?

Personalization: Some organizations prefer to have a user agent that is specific to them, so that websites know what organization they are being crawled by.

Better access: Occasionally, websites will serve our crawler different content (like an error message) than they would serve a browser or another crawler.

How to use a custom user agent

Custom user agents can be added behind the scenes by a Web Archivist. They can be applied to your entire account so that all crawls will use it, or to a specific collection so that only crawls run it that collection will use it. If you are interested in using a Custom User Agent when crawling please get in touch with the Archive-It team by submitting a support ticket.

Articles in this section

Overview

What is a user agent?

Why use a custom user agent?

How to use a custom user agent

Related content

Comments

Articles in this section

Overview

What is a user agent?

Why use a custom user agent?

How to use a custom user agent

Related content

Related articles