Informática y Ciencias de la InformaciónInglésBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.
ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Página de inicioFeed AtomMastodonISSN 2051-8188
language
Publicado

Note to self. The challenge of finding specimen citations in papers keeps coming around. It seems that this is basically the same problem as finding citations to papers, and can be approached in much the same way. If you want to build a database of reference from scratch, one way is to scrape citations from papers (e.g., from the "literature cited" section), convert those strings into structured data, and add those to your database.

Publicado

Garnett et al. recently published a paper in PLoS Biology that starts with the sentence "Lists of species matter": This paper (one of a forthcoming series) is pretty much the kind of paper I try and avoid reading.

Publicado

Given that it's the start of a new year, and I have a short window before teaching kicks off in earnest (and I have to revise my phyloinformatics course) I'm playing with a few GBIF-related ideas. One topic which comes up a lot is annotating and correcting errors. There has been some work in this area [1][2] bit it strikes me as somewhat complicated.

Publicado

I've been banging on about having citable, persistent identifiers for specimens, so was suitably impressed when Derek Sikes posted a comment on iPhylo that Arctos already does this. For example, here is a DOI for a specimen: http://dx.doi.org/10.7299/X7VQ32SJ. So, we're all done, right? Not quite.

Publicado

Continuing the theme of trying to map specimens cited in the literature to the equivalent GBIF records, consider the GBIF record http://data.gbif.org/occurrences/685591320, which according to GBIF is specimen "ZFMK 188762" (a [sic] holotype of Praomys hartwigi ).This is odd, because the original publication of this name (Eisentraut, M. 1968 .Beitrag zur Saugetierfauna von Kamerun.

Publicado

Based on recent discussions my sense is that our community will continue to thrash the issue of identifiers to death, repeating many of the debates that have gone on (and will go on) in other areas. To be trite, it seems to me we have three criteria: cheap , resolvable , and persistent . We get to pick two. Cheap and resolvable means URLs, which everybody is nervous about because they break.

Publicado

One reason I'm pursuing the theme of specimen identifiers (and identifiers in general) is the central role they play in annotating databases. To give a concrete example, I (among others) have argued for a wiki-style annotation layer on top of GenBank to capture things such as sequencing errors, updated species names, etc. Annotation is a lot easier if we have consistent identifiers for the things being annotated.

Publicado

Following on from exploring links between GBIF and GenBank here I'm going to look at links between GBIF and the primary literature, in this case articles scanned by the Biodiversity Heritage Library (BHL). The OCR text in BHL can be mined for a variety of entities. BHL itself has used uBio's tools to identity taxonomic names in the OCR text, and in my BioStor project I've extracted article-level metadata and geographic co-ordinates.