Published April 20, 2009 | https://doi.org/10.59350/hzb77-f8f97

Accessing specimens using TAPIR or, why do we make this so hard?

  • 1. ROR icon University of Glasgow

OK, second rant of the day. One of my favourite online specimen databases is AntWeb. For a while the ability to harvest data from this database using the venerable DiGIR protocol hasn't been possible, due to various issues at the California Academy of Sciences. Well, now it's back, and "accessible" using TAPIR (TAPIR - TDWG Access Protocol for Information Retrieval). Accessible, that is, if you like horrifically over-engineered, poorly documented standards. OK, at lot of work has gone into TAPIR, there's lots of great code on SourceForge, and there's lots of documentation, but I've really struggled to get the most basic tasks done.

For example, let's imagine I and want to retrieve the information on the ant specimen CASENT0100367 (note how trivial this is via a web browser, just append the specimen name to http://www.antweb.org/specimen.do?name=). After much clenching of teeth struggling with the TAPIR documentation and the TAPIR client software, I finally found an email by Markus Döring that gave me the clue. If I'm going to construct a URL to retrieve this specimen record, I need to include the URL of an XML document that serves as a template for the query. Since one doesn't exist, I have to create it and make it accessible to the TAPIR server (i.e., the AntWeb TAPIR server needs to access it, so I have to place this XML document on my web server). The template (shown below) lives at http://bioguid.info/tapir/dwc_catalog_number.xml:


                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
                xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                                    http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd
                                    http://www.w3.org/2001/XMLSchema
                                    http://www.w3.org/2001/XMLSchema.xsd">
  Scientific name in query
  Query for a Scientific Name. Based on http://rs.tdwg.org/tapir/cs/dwc/1.4/template/dwc_sci_name_range.xml, found in email by Markus Döring http://lists.tdwg.org/pipermail/tdwg-tapir/2008-April/000493.html
  
  
    
        
        
    
  

Now I can write my query: http://www.antweb.org/tapirlink/www/tapir.php/antweb
op=search
&start=0
&limit=1
&template=http://bioguid.info/tapir/dwc_catalog_number.xml
&name=casent0100367

So, the AntWeb server is going to read this query, and call my web server to get the query template to figure out what I actually want. Am I the only person who thinks that this is insane? Can anybody imagine going through these hoops to access a GenBank record, or a PubMed record?

Perhaps it's me, and my obsession with linking individual data records (rather than harvested lots of records, or federated search). But it strikes me that harvesting is a simple task and not many people will be doing it (at least, not on the scale of GBIF), and federated search is a non-starter as our community can't keep data providers online to save themselves.

In many ways I think TAPIR (and DiGIR before it) missed what for me is the most basic use case, namely I have a specimen identifier and I want to get the record for that specimen. These services make it much harder than it needs to be. It's a symptom of our field's inability to deliver simple tools that do basic tasks well, rather than overly general and highly complex tools that are poorly documented. Of course, retrieving individual records woud be easy if we have resolvable GUIDs for specimens, but we've singularly failed to deliver that, so we are stuck with very clunky tools. There's got to be a better way...

Additional details

Description

OK, second rant of the day. One of my favourite online specimen databases is AntWeb. For a while the ability to harvest data from this database using the venerable DiGIR protocol hasn't been possible, due to various issues at the California Academy of Sciences.

Identifiers

UUID
2495fe44-81c9-4f59-a512-fd55d8dcfdcf
GUID
tag:blogger.com,1999:blog-16081779.post-2184898035103306794
URL
https://iphylo.blogspot.com/2009/04/accessing-specimens-using-tapir-or-why.html

Dates

Issued
2009-04-20T17:37:00
Updated
2009-04-20T23:19:15