Rogue Scholar

Pubblicato 29 ottobre 2024

Just a placeholder to mark the ongoing impact of the Internet Archive being attacked (see here, here and here for details). The impact of this on the Biodiversity Heritage Library (BHL) has been huge, and reveals the extent to which BHL depends on the Archive.

Scienze informatiche e dell'informazioneInglese

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

https://doi.org/10.59350/6qepn-ge510

Pubblicato 18 ottobre 2024

Autore Roderic Page

Recently I’ve been exploring data downloaded from BOLD. Part of this was motivated by work done with David Schindel for a recent book: In this blog post I record some struggles I’ve had with the supposedly “Frictionless” data provided by BOLD. I list a serious of issues, and make some recommendations as to how these can be fixed. Previous versions disappear from site The web page Data Packages lists datasets that can be downloaded.

Scienze informatiche e dell'informazioneInglese

The Data Citation Corpus revisited

https://doi.org/10.59350/wvwva-v7125

Pubblicato 8 ottobre 2024

Autore Roderic Page

TL;DR These are some brief notes on the latest version (v. 2) of the Data Citation Corpus, relased shortly before the Make Data Count Summit 2024, which also included a discussion on the practical uses of the corpus. I downloaded version 2 from Zenodo doi:10.5281/zenodo.13376773. The data is in JSON format, which I then loaded into CouchDB to play with.

Scienze informatiche e dell'informazioneInglese

Why do museum and gallery displays ignore the web?

https://doi.org/10.59350/a83tn-c6t14

Pubblicato 13 agosto 2024

Autore Roderic Page

This post is inspired by the Pharaoh exhibition at the NGV in Melbourne, Australia. This is a beautifully displayed exhibition of objects from the British Museum, London. It has all the trappings of a modern exhibition, beautiful lighting, a custom sound track, and lots of social media coverage. But I found it immensely frustrating to visit.

Scienze informatiche e dell'informazioneInglese

A future for the Biodiversity Heritage Library

https://doi.org/10.59350/n3dkt-6xd05

Pubblicato 2 luglio 2024

Autore Roderic Page

Following the 2024 BHL meeting, and the departure of Martin Kalfatovic and the uncertainty the departure of such a pivitol person brings, perhaps it’s time to think about the future of BHL. Below I sketch some thoughts, which are hazy at best. I should say at the outset that I think BHL is an extraordinary project. My goal is to think about ways to enhance its utility and impact.

Scienze informatiche e dell'informazioneInglese

Visualising big trees: a talk at the Systematics Association 2024

https://doi.org/10.59350/cf6n4-ch767

Pubblicato 19 giugno 2024

Autore Roderic Page

This blog post has some notes in support of a talk given to the Systematics Association meeting in Reading June 20th, 2024. Slides I will post a link to the slides here once I have given the talk. Page, Roderic (2024). Visualising big trees. figshare. Presentation.

FAIRIdentifiersNanopublicationPensoftRDFScienze informatiche e dell'informazioneInglese

Nanopubs, a way to create even more silos

https://doi.org/10.59350/6nj85-7te92

Pubblicato 18 giugno 2024

Autore Roderic Page

Pensoft have recently introduced “nanopubs”, small structured publications that can be thought of as containing the minimum possible statement that could be published. Nanopubs are promoted as FAIR, that is findable, accessible, interoperabile, and reusable. I like the idea of nanopubs, but the examples I have seen so far are problematic.

Scienze informatiche e dell'informazioneInglese

Notes on transforming BHL images

https://doi.org/10.59350/2gpbb-98a53

Pubblicato 19 aprile 2024

Autore Roderic Page

How to cite: Page, R. (2024). Notes on transforming BHL images https://doi.org/10.59350/2gpbb-98a53 I’ve been down this road before, e.g. BHL, DjVu, and reading the f*cking manual and Demo of full-text indexing of BHL using CouchDB hosted by Cloudant, but I’m revisiting converting BHL page scans to black and white images, partly to clean them up, to make them closer to what a modern reader might expect, and partly to reduce the

Scienze informatiche e dell'informazioneInglese

Hugging Face Autotrain

https://doi.org/10.59350/7p1n4-wdv84

Pubblicato 27 marzo 2024

Autore Roderic Page

How to cite: Page, R. (2024). Hugging Face Autotrain https://doi.org/10.59350/7p1n4-wdv84 These are notes to myself on using Hugging Face AutoTrain. The first version of this had a very nice interface where you could simply upload a folder of images and train a model. It was limited in the range of tasks and models, but made up for that in ease of use.

Scienze informatiche e dell'informazioneInglese

Problems with the DataCite Data Citation Corpus

https://doi.org/10.59350/t80g1-xys37

Pubblicato 20 febbraio 2024

Autore Roderic Page

How to cite: Page, R. (2024). Problems with the DataCite Data Citation Corpus https://doi.org/10.59350/t80g1-xys37 DataCite have released the Data Citation Corpus, together with a dashboard that summarises the corpus. This is billed as: The goal is to build a citation database between scholarly articles and data, such as datasets in repositories, sequences in GenBank, protein structures in PDB, etc.

iPhylo

Internet Archive as a single point of failure

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

The Data Citation Corpus revisited

Why do museum and gallery displays ignore the web?

A future for the Biodiversity Heritage Library

Visualising big trees: a talk at the Systematics Association 2024

Nanopubs, a way to create even more silos

Notes on transforming BHL images

Hugging Face Autotrain

Problems with the DataCite Data Citation Corpus