Impact of bots on Wayback?

July 08, 2025 21:41

Hello all,

I've been experiencing weird Wayback stuff, and wonder if others are having similar experiences. Here's an example. When I click on the URL on the frontend, I'm supposed to be taken to a calendar page. Instead, I am taken to a page like this:

https://wayback.archive-it.org/4726/20250513152316id_/https://library.washu.edu/

This is a particular capture, not a calendar page. I can replace "20250513152316id_" with an asterisk, and get to the calendar page. At that point, though, sometimes it says 0 captures. Then, the captures might show up if I wait, and reload the page.

Anyone else having similar issues? The only current issue listed on the Archive-It System Status page is Wayback service disruptions due to bots. So, I wonder if this is that...and if there is any more context around that I can learn.

Thanks,
Sarah

Comments

3 comments

Karl Blumenthal July 09, 2025 17:11

Thanks for asking Sarah!

Can you please tell us more about how and/or where you are trying to access the missing calendar page in this example so we can retrace your steps? I don't see this URL as a seed on the public collection page for instance. Is there somewhere else that I can start?

0

Comment actions Permalink
Sarah Weeks July 11, 2025 19:10

Karl, thanks for replying. I see now, that exact seed is not in our collection. (Wustl.edu has changed to washu.edu for many of our sites, and I'm working on how to handle that.) Here's a different example:

I started at this collection:
https://archive-it.org/collections/20310
Some of those items say 0 captures, some get hung up on "Loading Wayback Info," and some show their correct number of captures.

Clicking on the second item, titled Jan Castro, I was taken here:
https://wayback.archive-it.org/4726/20250512205823id_/http://jancastro.com/

Going back to the collection and clicking the item again, I do get to the calendar page. Clicking on a blue dot, I get this:
https://wayback.archive-it.org/4726/20210625214051id_/http://jancastro.com/
...which is a Not in Archive page. Trying again, I get the capture as it's supposed to look.

I have also had the experience with this Jan Castro example of being taken straight to a capture, bypassing the calendar page.

Another, separate example:
In trying to access this item https://wayback.archive-it.org/19943/*/https://sites.wustl.edu/transgenderspectrumconference/
I see a blue dot I can click it and get to the capture. Clicking around within the capture, I get: https://partner.archive-it.org/missing_url_record?reason=INVALID_UNKNOWN&referrer=https%3A%2F%2Fwustl.app.box.com%2Fs%2Fgn2vbea8q9i2g7ay0sbhcg50zwfein8t&mime=&status=404&size=0&collId=4726×tamp=20250512205823&url=https%3A%2F%2Fsites.wustl.edu%2Ftransgenderspectrumconference%2F

Now, that one I've never seen before - and I'm not sure how wustl.app.box.com got mixed up in there, but I know Box pages to be uncapturable.
I have also had this missing_url_record thing happen on other seeds I was trying to access.

Reloading, and backing out and trying again, seem to do a lot towards addressing both problems.

0

Comment actions Permalink
Karl Blumenthal July 14, 2025 13:27
Gotcha! Thank you for that context. May I ask if the issues persist even after you 1) disable Wayback QA, and 2) clear your web browser of any old data from wayback.archive-it.org?

1. To turn off Wayback QA, click the link in the banner:

2. To remove the old data from Archive-It by web browser:
- Chrome: Go to chrome://settings/content/all?searchSubpage=archive-it and click the trash can icon.
- Brave: Go to brave://settings/content/all?searchSubpage=archive-it and click the trash can icon.
- Safari: From the menu, select Safari > Settings... > Privacy > Manage Website Data... and search for archive-it. Press the "Remove" button, followed by "Done."
- Edge: From the menu, select Settings and more > Settings > Cookies and site permissions. Under "Cookies and data stored," select Manage and delete cookies and site data > See all cookies and site data. Search for "archive-it." Click on the "Delete" button with the trash icon.
- Firefox: Go to about:preferences#privacy and click on the "Manage data" button. Search for and select "archive-it." Press the "Remove selected" button, followed by "Save changes."
0

Comment actions Permalink

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?