Rogue Scholar

Publicados 18 de junho de 2012 in iPhylo

Anyone who works with taxonomic databases is aware of the fact that they have errors. Some taxonomic databases are restricted in scope to a particular taxon in which one or more people have expertise, these then get aggregated into larger databases, which may in turn be aggregated by databases whose scope is global. One consequence of this is that errors in one database can be propagated through many other databases.

GBIFLinkingLinkoutNCBITreeBASECiências da Computação e da InformaçãoInglês

Linking NCBI taxonomy to GBIF

https://doi.org/10.59350/sg04y-k2b09

Publicados 2 de junho de 2012 in iPhylo

Autor Roderic Page

In response to Rutger Vos's question I've started to add GBIF taxon ids to the iPhylo Linkout website. If you've not come across iPhylo Linkout, it's a Semantic Mediawiki-based site were I maintain links between the NCBI taxonomy and other resources, such as Wikipedia and the BBC Nature Wildlife finder. For more background see Page, R. D. M. (2011). Linking NCBI to Wikipedia: a wiki-based approach. PLoS Currents, 3, RRN1228.

BioStorClassificationData CleaningErrorGBIFCiências da Computação e da InformaçãoInglês

The GBIF classification is broken — how do we fix it?

https://doi.org/10.59350/5a5re-kp839

Publicados 30 de maio de 2012 in iPhylo

Autor Roderic Page

This post arose from an ongoing email conversation with Tony Rees about extracting and annotating taxonomic names. In BioStor I use the GBIF classification to display the taxonomic names found in the OCR text in the form of a tree. The idea is to give the reader a sense of "what the paper is about". I also use the classification to help link to GBIF occurrence records.

BHLBiomedicalGBIFLinkingMekong River SchistosomiasisCiências da Computação e da InformaçãoInglês

BHL and GBIF as biomedical databases

https://doi.org/10.59350/8pp2p-9dh09

Publicados 27 de março de 2012 in iPhylo

Autor Roderic Page

When I think of the Biodiversity Heritage Library (BHL) or GBIF I tend to think of taxonomy and biodiversity. Folk wisdom has it that BHL is full of old books, mostly pre-1923. Great for finding old taxonomic names, or nice artwork, but not exactly "modern" biology. GBIF is mainly about displaying organism distributions based on museum specimens, the primary data of taxonomic research.

AnnotationErrorGBIFGenbankIdentifiersCiências da Computação e da InformaçãoInglês

Yet more reasons to have specimen identifiers: annotating GenBank sequences

https://doi.org/10.59350/k46hh-dz648

Publicados 1 de março de 2012 in iPhylo

Autor Roderic Page

One reason I'm pursuing the theme of specimen identifiers (and identifiers in general) is the central role they play in annotating databases. To give a concrete example, I (among others) have argued for a wiki-style annotation layer on top of GenBank to capture things such as sequencing errors, updated species names, etc. Annotation is a lot easier if we have consistent identifiers for the things being annotated.

BioStorDigitisationGBIFHostLiceCiências da Computação e da InformaçãoInglês

GBIF specimens in BioStor: who are the top ten museums with citable specimens?

https://doi.org/10.59350/d97rd-ea309

Publicados 28 de fevereiro de 2012 in iPhylo

Autor Roderic Page

Brief update on yesterday's post about finding specimens in BioStor. BioStor has some 66,000 articles from BHL, from which I've extracted 143,000 cases of a specimen code being cited in the text. Of these 143,000 occurrences, 81,000 have been matched to an occurrence in GBIF.

BHLBioStorGBIFIdentifiersLinkingCiências da Computação e da InformaçãoInglês

Linking GBIF and the Biodiversity Heritage Library

https://doi.org/10.59350/ehbwx-fjv34

Publicados 27 de fevereiro de 2012 in iPhylo

Autor Roderic Page

Following on from exploring links between GBIF and GenBank here I'm going to look at links between GBIF and the primary literature, in this case articles scanned by the Biodiversity Heritage Library (BHL). The OCR text in BHL can be mined for a variety of entities. BHL itself has used uBio's tools to identity taxonomic names in the OCR text, and in my BioStor project I've extracted article-level metadata and geographic co-ordinates.

Darwin Core RipletDuplicatesGBIFIdentifiersSpecimen CodesCiências da Computação e da InformaçãoInglês

How many specimens does GBIF really have?

https://doi.org/10.59350/2d3dv-8q010

Publicados 23 de fevereiro de 2012 in iPhylo

Autor Roderic Page

Duplicate records are the bane of any project that aggregates data from multiple sources.

FrogsGBIFGenbankGeophylogenyKMLCiências da Computação e da InformaçãoInglês

Linking GBIF and Genbank

https://doi.org/10.59350/hj161-hh554

Publicados 21 de fevereiro de 2012 in iPhylo

Autor Roderic Page

As part of my mantra that it's not about the data, it's all about the links between the data, I've started exploring matching GenBank sequences to GBIF occurrences using the specimen_voucher codes recorded in GenBank sequences. It's quickly becoming apparent that this is not going to be easy.

Darwin Core RipletDNA BarcodingDOIGBIFIdentifiersCiências da Computação e da InformaçãoInglês

DNA Barcoding, the Darwin Core Triplet, and failing to learn from past mistakes

https://doi.org/10.59350/aq4wb-dt356

Publicados 11 de dezembro de 2011 in iPhylo

Autor Roderic Page

Given various discussions about identifiers, dark taxa, and DNA barcoding that have been swirling around the last few weeks, there's one notion that is starting to bug me more and more.

Postagens de Rogue Scholar

Fictional taxa

Linking NCBI taxonomy to GBIF

The GBIF classification is broken — how do we fix it?

BHL and GBIF as biomedical databases

Yet more reasons to have specimen identifiers: annotating GenBank sequences

GBIF specimens in BioStor: who are the top ten museums with citable specimens?

Linking GBIF and the Biodiversity Heritage Library

How many specimens does GBIF really have?

Linking GBIF and Genbank

DNA Barcoding, the Darwin Core Triplet, and failing to learn from past mistakes