Integration of Archive-It with WordPress sites (or with RSS feeds) for metadata?
Has anyone done any kind of integration with a WordPress site in terms of capturing WordPress categories and tags as Archive-It metadata?
Or does anyone have any suggestions for how we might automate the collection of metadata from a WP site?
To give a concrete example, we are archiving our main site, https://www.internetsociety.org/ On that site, we frequently publish blog posts that already have structured meta data in terms of the author name, category, sometimes tags, etc.
In my ideal world, all of that metadata would somehow be captured so that when I go to the Collection page for the archive - https://archive-it.org/collections/10101 - I could see the categories, authors, etc. on the left side and be able to go to archive search results for those terms.
Now I realize it's hard for a crawler to FIND some of these items in the source of a page, but I wondered if there are specific <link> elements or other elements that we could include to help the Archive-It crawler.
Alternatively, we have a RSS feed - https://www.internetsociety.org/feed/ - that again includes structured metadata.
Could any of this be used to help us automate the creation of metadata in our Archive-It collection?
Thanks,
Dan
-
FYI, at the suggestion of our Partner Coordinator, I opened a Feature Request at: https://support.archive-it.org/hc/en-us/community/posts/360021199531-Mechanism-to-make-existing-website-metadata-available-to-crawler-to-automate-metadata-entry-
If you think this would be a useful addition to Archive-It, please leave comments over there (and/or add your vote).
-
Hi Dan,
There isn't currently any mechanism in place by which metadata can be automatically ingested from Wordpress sites into Archive-It metadata fields. However, a quick search of available Wordpress plug-ins revealed that there may be an existing integration (see: https://wordpress.org/plugins/wp-all-export/) that will allow you to export metadata from Wordpress sites as .csv files. You could then rename the columns to map to the appropriate field names, transform this file to .ods format and bulk upload it as seed and/or document level metadata into Archive-It.
If you pursue this strategy and successfully develop a workflow, we'd welcome you to write a blog post about it so that the broader community can benefit from your efforts.
Many thanks,
Mary
Please sign in to leave a comment.
Comments
2 comments