Rogue Scholar

Pubblicato 15 novembre 2018 in iPhylo

I've put a short note up on bioRxiv about ways to geocode nucleotide sequences in databases such as GenBank. The preprint is "Geocoding genomic databases using GBIF" https://doi.org/10.1101/469650.

GBIFGenbankGeoreferencingNote To SelfScienze informatiche e dell'informazioneInglese

Notes on finding georeferenced sequences in GenBank

https://doi.org/10.59350/k5syz-mpb09

Pubblicato 6 ottobre 2017 in iPhylo

Autore Roderic Page

Notes on how many georeferenced DNA sequences there are in GenBank, and how many could potentially be georeferenced.

GBIFGenbankKnowledge GraphSpecimen CodesScienze informatiche e dell'informazioneInglese

Linking specimen codes to GBIF

https://doi.org/10.59350/g6gq1-crg31

Pubblicato 15 aprile 2015 in iPhylo

Autore Roderic Page

I've put together a working demo of some code I've been working on to discover GBIF records that correspond to museum specimen codes. The live demo is at http://bionames.org/~rpage/material-examined/ and code is on GitHub. To use the demo, simply paste in a specimen code (e.g., "MCZ 24351") and click Find and it will do it's best to parse the code, then go off to GBIF and see what it can find.

GenbankMuseum CollectionsScienze informatiche e dell'informazioneInglese

The Experimenter’s Museum: GenBank, Natural History, and the Moral Economies of Biomedicine

https://doi.org/10.59350/wqdnj-fhz15

Pubblicato 8 aprile 2014 in iPhylo

Autore Roderic Page

An undergraduate student (Aime Rankin) doing a project with me on citation and impact of museum collections came across a paper I hadn't seen before: Unfortunately the paper is behind a paywall, but here's the abstract (you can also get a PDF here): It's well worth a read. It argues that sequence databases such as Genbank are essentially the equivalent of the great natural history museums of the 19th Century. There are several ironies here.

GenbankNCBIType SpecimensScienze informatiche e dell'informazioneInglese

NCBI taxonomy database now shows type material

https://doi.org/10.59350/217kh-1j345

Pubblicato 24 gennaio 2014 in iPhylo

Autore Roderic Page

Scott Federhen told me about a nice new feature in GenBank that he's described in a piece for NCBI News. The NCBI taxonomy database now shows a its of type material (where known), and the GenBank sequence database "knows: about types. Here's the summary: You can query for sequences from type using the query "sequence from type"[filter]. This could lead to some nice automated tools.

DNA BarcodingGenbankGPSGuest PostScienze informatiche e dell'informazioneInglese

Guest post: response to "Putting GenBank Data on the Map"

https://doi.org/10.59350/fdkqx-47m65

Pubblicato 12 dicembre 2013 in iPhylo

Autore Roderic Page

The following is a guest blog post by David Schindel and colleagues and is a response to the paper by Antonio Marques et al. in Science doi:10.1126/science.341.6152.1341-a. Marques, Maronna and Collins (1) rightly call on the biodiversity research community to include latitude/longitude data in database and published records of natural history specimens.

AnnotationErrorGBIFGenbankIdentifiersScienze informatiche e dell'informazioneInglese

Yet more reasons to have specimen identifiers: annotating GenBank sequences

https://doi.org/10.59350/k46hh-dz648

Pubblicato 1 marzo 2012 in iPhylo

Autore Roderic Page

One reason I'm pursuing the theme of specimen identifiers (and identifiers in general) is the central role they play in annotating databases. To give a concrete example, I (among others) have argued for a wiki-style annotation layer on top of GenBank to capture things such as sequencing errors, updated species names, etc. Annotation is a lot easier if we have consistent identifiers for the things being annotated.

FrogsGBIFGenbankGeophylogenyKMLScienze informatiche e dell'informazioneInglese

Linking GBIF and Genbank

https://doi.org/10.59350/hj161-hh554

Pubblicato 21 febbraio 2012 in iPhylo

Autore Roderic Page

As part of my mantra that it's not about the data, it's all about the links between the data, I've started exploring matching GenBank sequences to GBIF occurrences using the specimen_voucher codes recorded in GenBank sequences. It's quickly becoming apparent that this is not going to be easy.

Bio2RDFCrossrefDOIGenbankIntegrationScienze informatiche e dell'informazioneInglese

TDWG Challenge - what is RDF good for?

https://doi.org/10.59350/wwpmk-0es54

Pubblicato 19 ottobre 2011 in iPhylo

Autore Roderic Page

Last month, feeling particularly grumpy, I fired off an email to the TDWG-TAG mailing list with the subject Lobbing grenades: a challenge . Here's the email: In the context of the TDWG meeting (happening as we speak and which I'm following via Twitter, hashtag #tdwg) Joel Sachs asked me whether I had any specific data in mind that could form the basis of a discussion. So, here goes.

Dark TaxaDNA BarcodingGenbankNCBIPost-taxonomicScienze informatiche e dell'informazioneInglese

Dark taxa: GenBank in a post-taxonomic world

https://doi.org/10.59350/xhvv2-xjt24

Pubblicato 12 aprile 2011 in iPhylo

Autore Roderic Page

In an earlier post (Are names really the key to the big new biology?, I questioned Patterson et al.'s assertion in a recent TREE article (doi:10.1016/j.tree.2010.09.004) that names are key to the new biology. In this post I'm going to revisit this idea by doing a quick analysis of how many species in GenBank have "proper" scientific names, and whether the number of named species has changed over time.

Messaggi di Rogue Scholar

Geocoding genomic databases using GBIF

Notes on finding georeferenced sequences in GenBank

Linking specimen codes to GBIF

The Experimenter’s Museum: GenBank, Natural History, and the Moral Economies of Biomedicine

NCBI taxonomy database now shows type material

Guest post: response to "Putting GenBank Data on the Map"

Yet more reasons to have specimen identifiers: annotating GenBank sequences

Linking GBIF and Genbank

TDWG Challenge - what is RDF good for?

Dark taxa: GenBank in a post-taxonomic world