Publicaciones de Rogue Scholar

language
Publicado in iPhylo

The goal of my BioNames project is to link every taxonomic name to its original description (initially focussing on animal names). The rationale is that taxonomy is based on evidence, and yet most of this evidence is buried in a non-digitised and/or hard to find literature. Surfacing this information not only makes taxonomic evidence accessible (see Surfacing the deep data of taxonomy), it also surfaces a lot of basic biological information.

Publicado in iPhylo

This a quick writeup of an analysis I did to make the case that the list of names held by the Index of Organism Names (ION) (part of Thomson Reuters) would be very useful for GBIF. I must declare a bias, in that I've spent a good chunk of the last 3-4 years exploring the ION database and investigating ways to link the taxonomic names it contains to the primary taxonomic literature, culminating in building BioNames.

Publicado in iPhylo

Quick notes on an experimental feature I've added to BioNames. It attempts to identify possible taxonomic synonyms by extracting pairs of names with the same species name that appear together on the same page of text. The text could be full text for an open access article, OCR text from BHL, or the title and abstract for an article.

Publicado in iPhylo

More for my own benefit than anything else I've decided to list some of the things I plan to work on this year. If nothing else, it may make sobering reading this time next year. A knowledge graph for biodiversity Google's introduction of the "knowledge graph" gives us a happy phrase to use when talking about linking stuff together. It doesn't come with all the baggage of the "semantic web", or the ambiguity of "knowledge base".

Publicado in iPhylo

One reason I was able to build BioNames is because a significant fraction of the taxonomic literature for animals is now online, either due to the efforts of the Biodiversity Heritage Library, digital archives, commercial publishers, or individual institutions and scientific societies. However there are still big gaps in literature availability.

Publicado in iPhylo

Quick note to highlight the following publication: This paper outlines the methods used by the BOLD project to cluster sequences into "BINS", and touches on the issue of dark taxa (taxa that are in GenBank but which lack formal scientific names). Might be time to revisit the dark taxa idea, especially now that I've got a better handle on the taxonomic literature (see BioNames) where the names of at least some dark taxa may lurk.