Trouble seeing test crawls
I tried saving two recent articles about a new Performing Arts Center that is being built in New Brunswick, but every time I crawl the site (with the exact URL in the seed), it says it is "Not in Archive". When I click on the link to see if it is correct, it is still an active website, and I have the / at the end of the URL. Any suggestions what I am doing incorrect? The "Not in Archive" page says it may not be in scope but this is the seed I used: www.njbiz.com/apps/pbcs.dll/article?AID=/20170404/NJBIZ01/170409959/stateoftheart-190m-arts-center-coming-to-new-brunswick/
-
Hi, Jaquelyn! Not to worry -- it looks like you made a good capture of that article here. As the banner atop this capture indicates though, you will just need to permanently save the results of this test crawl before you can access it from all of the usual shortcuts in the Archive-It web application and/or public website. In the meantime, you can find the "Wayback >" shortcut to see this test capture and either save or delete it permanently in its crawl report. There's a link to that in the banner, but for an illustrated guide to your test crawl and what you can do with it, be sure to check out this Help Center article: Run, monitor, and save a test crawl.
-
That's quite all right, Jaquelyn. It looks to me like that test crawl documentation doesn't make this quite clear so I will update it (thanks for pointing this out!), but I believe that you are not currently seeing the capture by way of the "Wayback" link in your crawl's seeds report because you have now saved this crawl and saved data, like other regular captures, takes up to 24 hours to index and populate in the Wayback interface. Long story short: if you saved this crawl data today, you should be able to see it at that URL above by tomorrow. If not, definitely submit a ticket so we can take a closer look and squash any bug.
The URLs for test and saved captures are slightly different, such as in this case...
The permanently saved URL for your article, as shown above: https://wayback.archive-it.org/9559/*/http://www.njbiz.com/apps/pbcs.dll/article?AID=/20170404/NJBIZ01/170409959/stateoftheart-190m-arts-center-coming-to-new-brunswick/
And the URL for the capture in its test-only form: https://wayback.archive-it.org/9559-test/*/http://www.njbiz.com/apps/pbcs.dll/article?AID=/20170404/NJBIZ01/170409959/stateoftheart-190m-arts-center-coming-to-new-brunswick/
Note the little "-test" text at the end of the collection number in the URL. Before saving your crawl, it would have been this latter "test" URL to which the Wayback link led you. Once everything has moved from test to permanent storage though, the permanent URL will be the one that works. Sorry for this confusion! But the good news is that your capture is accessible, progressing into permanent storage, and should be available to view from there very soon.
Please sign in to leave a comment.

Comments
5 comments