Publicaciones de Rogue Scholar

language
Publicado in rOpenSci - open tools for open science
Autor David Winter

I am happy to say that the latest issue of The R Journal includes a paperdescribing rentrez,the rOpenSci package for retrieving data from the National Center for Biotechnology Information(NCBI). The NCBI is one of the most important sources of biological data. The centreprovides access to information on 28 million scholarly articles through PubMed and 250million DNA sequences through GenBank.

Publicado in iPhylo

In a recent Twitter conversation including David Shorthous and myself (and other poor souls who got dragged in) we discussed how to demonstrate that adopting JSON-LD as a simple linked-data friendly format might help bootstrap the long awaited "biodiversity knowledge graph" (see below for some suggestions for keeping JSON-LD simple). David suggests partnering with "Three small, early adopting projects". I disagree.

Publicado in iPhylo

If we view biodiversity data as part of the "biodiversity knowledge graph" then specimens are a fairly central feature of that graph. I'm looking at ways to link specimens to sequences, taxa, publications, etc., and doing this across multiple data providers.

Publicado in iPhylo

Scott Federhen told me about a nice new feature in GenBank that he's described in a piece for NCBI News. The NCBI taxonomy database now shows a its of type material (where known), and the GenBank sequence database "knows: about types. Here's the summary:You can query for sequences from type using the query "sequence from type"[filter]. This could lead to some nice automated tools.

Publicado in iPhylo

In response to Rutger Vos's question I've started to add GBIF taxon ids to the iPhylo Linkout website. If you've not come across iPhylo Linkout, it's a Semantic Mediawiki-based site were I maintain links between the NCBI taxonomy and other resources, such as Wikipedia and the BBC Nature Wildlife finder. For more background seePage, R. D. M. (2011). Linking NCBI to Wikipedia: a wiki-based approach. PLoS Currents, 3, RRN1228.

Publicado in iPhylo

Dark taxa have become even darker. NCBI has pulled the plug on large numbers of DNA barcode sequences that lack scientific names. For example, taxon Cyclopoida sp. BOLD:AAG9771 (tax_id 818059) now has a sparse page that has no associated sequences. From an earlier download of EMBL I know that this taxon is associated with at least 5 sequences, such as GU679674. But if you go to that sequence you get this:So the the sequence is hidden.

Publicado in iPhylo

Last week I was at the NSF "Assembling, Visualising and Analysing the Tree of Life" Ideas Lab, run by KnowInnovation.com/. It was an interesting experience, essentially a structured week of brainstorming ideas.One thing I came away with is the feeling that our notions of the "tree of life" are fuzzy, contradictory, and often probably unobtainable.

Publicado in iPhylo

In an earlier post (Are names really the key to the big new biology?, I questioned Patterson et al.'s assertion in a recent TREE article (doi:10.1016/j.tree.2010.09.004) that names are key to the new biology.In this post I'm going to revisit this idea by doing a quick analysis of how many species in GenBank have "proper" scientific names, and whether the number of named species has changed over time.