What is a user agent?
A User Agent identifies software to the server or site it's accessing. The Archive-It crawlers, for example, generally include archive.org_bot and/or Archive-It in their User Agent String. This lets a site know that it's being crawled by an Archive-It partner.
Websites also use the User Agent to determine what kind of information to serve. For example, a website might send different content to a mobile browser than it would a browser accessing the site from a desktop.
Why use a Custom User Agent?
Personalization: Some organizations prefer to have a User Agent that is specific to them, so that websites know what organization they are being crawled by.
Better Capture: Occasionally, websites will serve the Archive-It crawlers different content (like an error message) than they would serve a browser or another crawler. Cases like this will usually need to be identified by an engineer, but can sometimes be improved by changing the user agent.
How can you use a Custom User Agent?
Upon your request, a Custom User Agent can be added by a member of the Archive-It team. They can be applied to an entire account so that all crawls will use it, or to a specific collection so that only crawls run it that collection will use it. If you are interested in using a Custom User Agent when crawling please get in touch with the Archive-It team by submitting a support ticket.
Comments
0 comments
Please sign in to leave a comment.