Archiving news clippings from web searches

Overview

Some Archive-It partners crawl the news section results of specific web searches in order to enhance or replace traditional “clippings” collections. These news searches have URLs that may be used like any other web archiving seed. This guide provides an overview of how to properly format, scope, and crawl news clippings from web searches.

Known Issues

There are no known issues with archiving news clippings from web searches.

You can find a full list of known issues for archiving various platforms on our Status of monitored platforms page.

On this page

How to format your seeds
How to scope news clippings from web search seeds

How to format your seeds

Format your seeds to match these examples precisely, with only the search terms replaced where indicated (in bold):
- Google: https://www.google.com/search?q=Internet+Archive&tbm=nws&num=100
- Bing: https://www.bing.com/news/search?q=Internet+Archive&qs=n
Assign your seed/s the One Page Plus seed type. This enables the crawler to archive the linked articles. If you prefer to archive only the results page itself, use One Page.

How to scope news clippings from web search seeds

Add a seed-level scoping rule to Ignore Robots.txt to each seed.
When crawling Google search results specifically, add an additional scoping rule to the seed/s to include URLs that contain the text: https://www.google.com/url?q=

Articles in this section

Overview

Known Issues

How to format your seeds

How to scope news clippings from web search seeds

Comments

Articles in this section

Overview

Known Issues

How to format your seeds

How to scope news clippings from web search seeds

Related articles