Darwin Core RipletDuplicatesGBIFIdentifiersSpecimen CodesInformatique et sciences de l'informationAnglais
Publié in iPhylo
Auteur Roderic Page
Duplicate records are the bane of any project that aggregates data from multiple sources.
Duplicate records are the bane of any project that aggregates data from multiple sources.
Quick note about a tool I've cobbled together as part of the phyloinformatics course, which addresses a long standing need I and others have to extract specimen codes from text. I've had this code kicking around for a while (as part of various never-finished data mining projects), but never got around to releasing it, until now.
Given various discussions about identifiers, dark taxa, and DNA barcoding that have been swirling around the last few weeks, there's one notion that is starting to bug me more and more.