Rogue Scholar

Pubblicato 1 luglio 2011 in OpenCitations blog

As previously described, the PubMed Central Open Access subset of journal articles yielded 6,529,815 independent bibliographic records of both citing and cited entities, while our use of the PubMed Entrez API provided a further 2,304,143 bibliographic records for the same cited entities. Before converting these references into RDF to create the Open Citations Corpust, we attempted to remove errors in the data.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Who wrote this paper? Author list problems in PubMed Central references

https://doi.org/10.59350/e4q4p-c2834

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

To illustrate three kinds of problems in obtaining correct author lists for Open Citation data from articles in the PubMed Central Open Access subset (OASS), I take three examples, the first of which is the result of a publication policy, the second due to mis-handling of an authorship attribution at the time of publication, and the third exemplifing errors introduced when handling non-English personal names.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Garbage in, garbage out – problems with bibliographic references

https://doi.org/10.59350/spqa9-p1864

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

The Open Citations Project has aimed to liberate bibliographic references from biomedical research literature as Open Linked Data, using as its starting corpus the Open Access Subset (OASS) of articles within PubMed Central. The greatest problem faced during this project, naively unanticipated before we started, was the extend of incompleteness, noise and errors of various sorts within the reference information extracted from the OASS articles.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Input data for Open Citations – the PMC Open Access Subset

https://doi.org/10.59350/refgh-34906

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

PubMed, created by the US National Library of Medicine in DATE, holds bibliographic records and abstracts for essentially all journal articles published in the biomedical sciences. It currently records almost a million new entries each year! PubMed Central (PMC), created as an extension of PubMed, is designed to hold full text articles from among the PubMed entries.

Data PublicationSemantic PublishingCitationDataDatacitecategories.socialScienceInglese

Pensoft Journals policy and author guidelines on data publication and citation

https://doi.org/10.59350/tz55j-2wy62

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

In a recent blog post, Heather Piwowar, in discussing the advantages of citing datasets in the reference list of the article, said “No journals have standardized on this approach so far”. However, Pensoft Journals, a publisher that specializes in publishing biodiversity and biological systematics papers, and that has taken the lead in promoting the publication of datasets with DOIs, has exactly such a policy.

JISCOntologiesOpen CitationsSemantic PublishingCitationcategories.socialScienceInglese

How to cite data

https://doi.org/10.59350/65msb-x7f82

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

As an approach towards developing best practice for data citation, I recently wrote a Data Citation Best Practice Discussion Document that is available on Google Docs, and that I have now slightly revised to Version 2 [1]. In that document, I first compared what is recommended by DataCite [2] and by Altman and King [3] with what currently practised by the Dryad Data Repository and what presently occurs ‘in the wild’ in a

Data PublicationOntologiesSemantic PublishingAnnotation OntologyCitationcategories.socialScienceInglese

Questions of granularity – Dryad’s use of DataCite DOIs for data citation, and the Annotation Ontology

https://doi.org/10.59350/8vbjm-s3963

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

DataCite is an international organisation, founded in 2009, which promotes the use of DOIs (Digital Object Identifiers) for published datasets, in order to establish easier access to research data, to increase acceptance of research data as legitimate contributions in the scholarly record, and to support data archiving to permit results to be verified and re-purposed for future study. Its founding members were the British Library;

JISCOntologiesOpen CitationsSemantic PublishingCitationcategories.socialScienceInglese

Functional clustering of CiTO properties

https://doi.org/10.59350/arzhh-97b82

Pubblicato 29 giugno 2011 in OpenCitations blog

Autore David M. Shotton

CiTO v2.0 contains just two main object properties, cito:cites and its inverse cito:isCitedBy , each of which as thirty-two sub-properties. Intentionally, these properties are not constrained as to domain or range, thereby maximising their applicability in a wide range of citation contexts.

Career SuicideCitationImpact FactorOutputQuoraScienze informatiche e dell'informazioneInglese

What is the best way to measure academic outputs that aren't publications?

https://doi.org/10.59350/wj5qt-yj850

Pubblicato 26 maggio 2011 in iPhylo

Autore Roderic Page

My institute is going through various reviews of staff performance and, frankly, I'm feeling somewhat vulnerable given my somewhat unorthodox (at least amongst my colleagues) approach to doing science.

CitationDataDryadScienze informatiche e dell'informazioneInglese

Data matters but do data sets?

https://doi.org/10.59350/8413a-6t870

Pubblicato 1 aprile 2011 in iPhylo

Autore Roderic Page

Interest in archiving data and data publication is growing, as evidenced by projects such as Dryad, and earlier tools such as TreeBASE. But I can't help wondering whether this is a little misguided. I think the issues are granularity and reuse. Taking the second issue first, how much re-use do data sets get? I suspect the answer is "not much". I think there are two clear use cases, repeatability of a study, and benchmarks.

Messaggi di Rogue Scholar

Citation correction methods

Who wrote this paper? Author list problems in PubMed Central references

Garbage in, garbage out – problems with bibliographic references

Input data for Open Citations – the PMC Open Access Subset

Pensoft Journals policy and author guidelines on data publication and citation

How to cite data

Questions of granularity – Dryad’s use of DataCite DOIs for data citation, and the Annotation Ontology

Functional clustering of CiTO properties

What is the best way to measure academic outputs that aren't publications?

Data matters but do data sets?