Logo
Community Submit a request
Sign in
  1. Archive-It Help Center
  2. Scoping & Running Crawls

Scoping & Running Crawls

Scoping Your Crawls

  • How Archive-It crawlers determine scope
  • Modify your collection or seed scope
  • How to add Seed level scoping rules to multiple seeds at once
  • Limit your crawl
  • Expand the scope of your crawl
  • Robots.txt exclusions and how they can impact your web archives
See all 9 articles

Running Crawls

  • How to run, monitor, and save a test crawl
  • How to manually start test and one-time crawls
  • How to crawl new seeds immediately with InstaCrawl
  • How to schedule crawls
  • How to add and use the Archive This! bookmarklet
  • Crawling with a Custom User Agent
See all 7 articles

Crawling Technology

  • Archive-It Crawling Technology
  • What is Brozzler?

Managing Crawls

  • How to select a time limit for your crawl
  • How to monitor currently running crawls
  • How to resume a finished crawl
  • How to find your crawl ID number
  • About data de-duplication

Scoping Recommendations for Specific Sites

  • Scoping guidance for specific types of sites
  • Archiving sites protected by Cloudflare
  • Archiving ArcGIS
  • Archiving Blogspot sites
  • Archiving Facebook
  • Archiving Flickr streams
See all 26 articles

FAQ: Crawling

  • What does HTTP error 61 mean and what can I do about it?
  • What are regular expressions and when should I use them?
  • How do I stop a seed from being crawled?
  • What is data-deduplication and how does it work?
  • How do I know how big a crawl will be?
Archive-It Help Center