Overview
You can learn a lot about your collections' archived content from the URLs and messages they generate in our Wayback browsing tool. This article will cover Wayback URLs' interpretation and queries, as well as Wayback error messages.
On this page:
- What is a Wayback URL?
- Wayback URLs' interpretation
- Wayback URL queries
- Wayback error message interpretation
What is a Wayback URL?
Wayback URLs are permanent links to your archived web pages (unless they're from unsaved test crawls). Each document will have a unique Wayback URL for each time it was collected. Wayback URLs have a specific format (see below). It starts with a prefix from the host wayback.archive-it.org and ends with the URL for the document collected. For example:
Wayback URLs' interpretation
Wayback URL format
Archived URLs from your collections are always formatted in a specific way. From left to right, they display:
- Archive-It's Wayback host information.
- Collection ID number (or organization number listed /org-###/)
- Date of collection (Time stamp broken down as yyyymmddhhmmss and recorded in GMT).
- Address for the archived document URL itself.
Wayback URL format interpretation
Let's examine this Wayback URL with the information above:
http://wayback.archive-it.org/194/20080414172354/http://www.governor.state.nc.us/
Means that governor.state.nc.us was collected in collection 194 as it appeared on April 14, 2008, at 5:23 PM, GMT.
Sometimes, you will see a Wayback URL with an organization's number in place of the collection ID number. For example:
http://wayback.archive-it.org/org-67/20080414172354/http://www.governor.state.nc.us/
Means that governor.state.nc.us was collected by organization 67 as it appeared on April 14, 2008, at 5:23 PM, GMT.
When the date is replaced with an asterisk, it's a Wayback calendar page. This Wayback URL will return all dates the URL was permanently saved in that collection in a calendar format. For example:
http://wayback.archive-it.org/194/*/http://www.governor.state.nc.us/
And when the collection ID has -test after it, that Wayback URL will return all the unsaved test crawls for that URL that haven't expired yet in that collection (if any) in a calendar format. For example:
http://wayback.archive-it.org/194-test/*/http://www.governor.state.nc.us/
On occasion, you can see the directory /all/ in place of the collection ID number. This URL will return all dates the URL was collected across all Archive-It public collections in a calendar format. For example:
http://wayback.archive-it.org/all/*/http://www.governor.state.nc.us/
Wayback URL Queries
You can search for archived URLs in your collections from inside your account. You can also do so from our public website (for example: http://www.archive-it.org/collections/194).
How to query
When you enter a specific URL into the Wayback search bar, your results will display a Wayback calendar page. For example:
- Enter: http://www.governor.state.nc.us/news/pressreleases/
- This Wayback page will display: http://wayback.archive-it.org/194/*/http://www.governor.state.nc.us/news/pressreleases/
URL Prefix Queries
This type of query will display all archived links for a given domain in a collection.
How it works
To see all archived links from a given host add an asterisk * (wildcard) to the end of the URL query. For example:
http://wayback.archive-it.org/194/*/http://www.governor.state.nc.us/*
Will display the total number of collected documents at the top of the screen from the www.governor.state.nc.us host.
Although this number is the total, the list will show only unique URLs. For example, you could have 1,000 links collected, but only be able to see 800 links listed. This is due to the fact that the same link has been collected many times.
You will see a number next to each link. This tells you the number of versions, or the number of times that specific link was collected.
URL Date Queries
This is a search by specific date or date range. This query relies on the 14-digit timestamp in the middle of each archived URL (yyyymmddhhmmss) to broaden or narrow the results.
How it works
You can use a combination of dates and asterisks * to manipulate which saved dates you see in your results.
We'll start with the example Wayback URL http://wayback.archive-it.org/194/20070913204539/http://www.governor.state.nc.us. It shows us www.governor.state.nc.us as it looked on September 13, 2007 at 20:45:39 GMT.
To see only the saved pages of www.governor.state.nc.us from 2007, replace the 14-digit timestamp with 2007 and an asterisk *:
http://wayback.archive-it.org/194/2007*/http://www.governor.state.nc.us/
You can change that year to any year you want.
To see only the saved pages from December 2007, replace the 14-digit timestamp with 200712 and an asterisk *:
http://wayback.archive-it.org/194/200712*/http://www.governor.state.nc.us/
You can change 200712 to any combination of year and month (yyyymm*) you'd like to see.
You can continue narrowing the date further:
- Add a day (yyyymmdd*).
- Add an hour (yyyymmddhh*)
- Add a minute (yyymmddhhmm*)
You can switch these queries at any time by changing the web address at the top of your browser window.
URL date queries only show results for the exact URL you are looking up. When you look up www.governor.state.nc.us, you are only seeing captures for that precise page.
If you are viewing a page deeper on the site, you can see the other dates on which that precise page was collected. Change the date code in that page's URL to *. For example:
http://wayback.archive-it.org/194/2008*/http://www.governor.state.nc.us/news/pressreleases/
Will show only collected pages of http://www.governor.state.nc.us/news/pressreleases/ in 2008.
Wayback error message interpretation
When browsing or querying your archives in the Wayback, you may see specific kinds of capture and/or replay problems. They may include:
This page has not been archived here: The Wayback page is not in archive, maybe because:
- This page might not have been included in this organization’s collecting plan.
- This page might have prevented Archive-It from collecting it.
- This page may have been collected but needs more time to appear in Wayback. If you just collected this page, please wait 24 hours for storage and indexing.
Blocked Site Error: Site owners or copyright holders have requested take down of that site from Archive-It's Wayback.
Robots.txt: A robots.txt file is something that a site owner puts on their site to keep crawlers from accessing them or parts of them. Archive-It's crawlers are polite and obey these by default. But partners can avoid them if needed for archived pages' display.
Redirect Error: If the page redirects more than five times, Wayback will stop following and show an error message. This can happen on sites that have lots of responsive scripts.
Failed Connection: If you see this message please contact us for help.
Comments
0 comments
Please sign in to leave a comment.