Partners may use Archive-It’s implementation of the Web Archiving Systems API (WASAPI) in order request derivative data files, to monitor their creation, and to download them from Archive-It’s storage servers. These files include the formats derived through the Archives Research Compute Hub (ARCH): WAT, WANE, and LGA.
Data are derived from selected WARC files in Archive-It storage. For instructions to specify these WARC files by their collections, crawl times, and/or other attributes, see: Find and download your WARC files with WASAPI. These parameters may then be used to request a derivation "job."
Request derivative files
Use the following template at a command line terminal in order to request data derived from your selected WARC files:
$ curl -vvv --user <username>:<password> -H 'Content-Type: application/json' -d '{"function":"build-<format>","query": "<parameters>"}' https://partner.archive-it.org/wasapi/v1/jobs
For instance, a sample Archive-It partner’s command to derive WAT files from all data that they collected between May 10, 2016 and May 12, 2017 would appear as:
$ curl -vvv --user CharlieArchivist:GreatPassword -H 'Content-Type: application/json' -d '{"function":"build-wat","query": "crawl-time-after=2016-05-10&crawl-time-before=2017-05-12"}' https://partner.archive-it.org/wasapi/v1/jobs
Monitor job progress
Each request becomes a data derivation job that can be monitored at: https://partner.archive-it.org/wasapi/v1/jobs
The resulting JSON object for each job includes information about the original request parameters, time started and completed, and the current status of the job process:
Download results
When the job’s state value is “complete,” then its results may be seen at a new URL with its jobtoken value, following this format: https://partner.archive-it.org/wasapi/v1/jobs/<jobtoken>/result
In the example above then, files may be downloaded manually through a web browser at: https://partner.archive-it.org/wasapi/v1/jobs/80/result
To download all or select resulting files in batches from the command line, use the format parameter and otherwise follow the instructions in: Find and download your WARC files with WASAPI
Comments
0 comments
Please sign in to leave a comment.