Skip Kendall
- Total activity 122
- Last activity
- Member since
- Following 0 users
- Followed by 0 users
- Votes 71
- Subscriptions 28
Comments
Recent activity by Skip Kendall Sort by recent activity-
Abigail, They're non-public seeds you run along with a public seed to achieve some desired result. In some cases, the page will only be captured properly if run as a seed. Another use case is with ...
-
Don't know how long you've been waiting but they do usually show up within a day or so. If there are indexing slowdowns, it can certainly take longer but I think the recent issue with that has been...
-
It seems most likely to me that there's a subtle difference between the seed and the actual URL. It's possible that a slightly incorrect seed URL gets corrected by the browser, so works when you lo...
-
Leah, I don't believe there's anything we can do to prevent this, it just happens sometimes. I agree that size doesn't have anything to do with it. I've had very small and very large crawls have t...
-
Ah, yes, I forgot about that. Yeah, that's what I was thinking about with the rule. If you were crawling 4 seeds together, each would need rules to exclude the other 3.
-
The only way that I can think that might work would be to create a scoping rule in each that excludes the others. That would be a fair amount of work, though. It may not be a problem, though. De-du...
-
We've been exploring all three. I wouldn't call any of them a high priority yet but I can see Threads, especially, getting there.
-
Hi Dan, Instagram's a pain. We haven't been able to directly crawl it for more than a year. Just isn't possible. For a while, we used manual tools (Webrecorder, Conifer) and then uploaded the WARCs...
-
Interesting. It's been working fine for me. The only thing I can think is that, when you're actually crawling the YouTube channel, you need to go to the watch pages in order to view the videos. You...
-
I suspected that was what you were having problems with. That is actually not what Archive-It describes as a repeating directory. A repeating directory would be something like https://www.royalroad...