We've been trying to capture various institutional YouTube channels since March or so, following the instructions for YouTube crawls here. That said, we keep getting the issue where it seems to capture all the content (indicated by a high data count and testing that the captured videos are as expected), but when you click on videos from the channel page, we get a "Not in Archive" error. The URL is always formatted like: https://accounts.google.com/watch?v=vAMYVdjVp4M, and everything after "watch?" matches the original YouTube URL. (Very occasionally, it won't result in a "Not in Archive" error, but take you to a totally blank page.)
We tried scoping in the accounts.google.com page, and while this resulted in a sort of wire-frame version of the YouTube watch page being captured, the video is still not being played back. We also have the seed formatted automatically as a YouTube video, so it should have the recommended settings already.
Does anyone know what might be happening here? Our hospital published a lot of COVID-19 information on YouTube, so this is a critical piece of our collection this year.
Arthur Aufses, Jr., MD Archives
Icahn School of Medicine at Mount Sinai
Please sign in to leave a comment.