Rogue Scholar

Pubblicato 1 luglio 2011 in OpenCitations blog

The input PubMed Central Open Access subset XML reference data, our starting corpus, were transformed into Open Citations RDF in multiple stages: The original XML was first transformed into an intermediate form using XSLT.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Citation correction methods

https://doi.org/10.59350/yet6y-e9m21

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

As previously described, the PubMed Central Open Access subset of journal articles yielded 6,529,815 independent bibliographic records of both citing and cited entities, while our use of the PubMed Entrez API provided a further 2,304,143 bibliographic records for the same cited entities. Before converting these references into RDF to create the Open Citations Corpust, we attempted to remove errors in the data.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Who wrote this paper? Author list problems in PubMed Central references

https://doi.org/10.59350/e4q4p-c2834

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

To illustrate three kinds of problems in obtaining correct author lists for Open Citation data from articles in the PubMed Central Open Access subset (OASS), I take three examples, the first of which is the result of a publication policy, the second due to mis-handling of an authorship attribution at the time of publication, and the third exemplifing errors introduced when handling non-English personal names.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Garbage in, garbage out – problems with bibliographic references

https://doi.org/10.59350/spqa9-p1864

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

The Open Citations Project has aimed to liberate bibliographic references from biomedical research literature as Open Linked Data, using as its starting corpus the Open Access Subset (OASS) of articles within PubMed Central. The greatest problem faced during this project, naively unanticipated before we started, was the extend of incompleteness, noise and errors of various sorts within the reference information extracted from the OASS articles.

JISCOpen CitationsBibliographyCitationCitation Datacategories.socialScienceInglese

Input data for Open Citations – the PMC Open Access Subset

https://doi.org/10.59350/refgh-34906

Pubblicato 1 luglio 2011 in OpenCitations blog

Autore David M. Shotton

PubMed, created by the US National Library of Medicine in DATE, holds bibliographic records and abstracts for essentially all journal articles published in the biomedical sciences. It currently records almost a million new entries each year! PubMed Central (PMC), created as an extension of PubMed, is designed to hold full text articles from among the PubMed entries.

JISCOntologiesOpen CitationsSemantic PublishingCitationcategories.socialScienceInglese

How to cite data

https://doi.org/10.59350/65msb-x7f82

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

As an approach towards developing best practice for data citation, I recently wrote a Data Citation Best Practice Discussion Document that is available on Google Docs, and that I have now slightly revised to Version 2 [1]. In that document, I first compared what is recommended by DataCite [2] and by Altman and King [3] with what currently practised by the Dryad Data Repository and what presently occurs ‘in the wild’ in a

Data PublicationJISCOntologiesDaPODatacategories.socialScienceInglese

Nomenclature for data publications and citations

https://doi.org/10.59350/6sc8h-5x796

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

The meaning of the word “dataset” is ambiguous, changing with context.

Data PublicationJISCOntologiesDataDatacitecategories.socialScienceInglese

DataCite2RDF – Mapping DataCite Metadata Scheme Terms to ontologies

https://doi.org/10.59350/8frta-ps113

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

The DataCite Metadata Kernel version 2.0 [1] specifies the minimal metadata, and optional metadata, that should accompany a DataCite DOI for the identification of a published data entity. Within the Metadata Kernel document there is an XML mapping of these metadata terms, using DCMI Metadata Terms, and an example encoded in XML.

Data PublicationJISCOntologiesSemantic PublishingDatacategories.socialScienceInglese

Using FaBiO to describe data entities

https://doi.org/10.59350/q6wzn-fsf12

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

In addition to using CiTO and CiTO4Data to describe relationships of relevance to data entities, as discussed in the previous blog post, FaBiO, the FRBR aligned Bibliographic Ontology described elsewhere, another member of the suite of SPAR (Semantic Publishing and Referencing) Ontologies, also has a number of classes and properties specifically designed for addressing data, software, metadata and other non-bibliographic entities.

Data PublicationJISCOntologiesSemantic PublishingCitocategories.socialScienceInglese

CiTO4Data – a new data-centric citation typing ontology

https://doi.org/10.59350/fts44-sdh45

Pubblicato 30 giugno 2011 in OpenCitations blog

Autore David M. Shotton

This is the first of a series of blog posts on the Open Citations blog that address the problem of citing data entities, for example a data package in a data repository, rather than bibliographic entities such as journal articles. For these purposes, the existence of DataCite to assign DOIs to datasets, and extensions to the SPAR (Semantic Publishing and Referencing) Ontologies to handle data items, are both important.

Messaggi di Rogue Scholar

The citation processing pipeline and the Open Citations Corpus

Citation correction methods

Who wrote this paper? Author list problems in PubMed Central references

Garbage in, garbage out – problems with bibliographic references

Input data for Open Citations – the PMC Open Access Subset

How to cite data

Nomenclature for data publications and citations

DataCite2RDF – Mapping DataCite Metadata Scheme Terms to ontologies

Using FaBiO to describe data entities

CiTO4Data – a new data-centric citation typing ontology