Rogue Scholar

LiteratureXMLParsingPubchunksFulltextScienze informatiche e dell'informazioneInglese

pubchunks: extract parts of scholarly XML articles

Pubblicato 16 ottobre 2018 in rOpenSci - open tools for open science

Autore Scott Chamberlain

pubchunks is a package grown out of the fulltext package. fulltextprovides a single interface to many sources of full text scholarly articles. Aspart of the user flow in fulltext there is an extraction step where fulltext::chunks()pulls parts of articles out of XML format article files.

PackagesXMLXsltScienze informatiche e dell'informazioneInglese

Using xml schema and xslt in R

https://doi.org/10.59350/he2rz-khr13

Pubblicato 10 gennaio 2017 in rOpenSci - open tools for open science

Autore Jeroen Ooms

This week an update for xml2 and a new xslt package have appeared on CRAN. A full announcement for xml2 version 1.1 will appear on the rstudio blog. This post explains xml validation (via xsd schema) and xml transformation (via xslt stylesheets) which have been added in this release. XML schemas and stylesheets are not exactly new; both xslt 1.1 (2001) and xsd 1.0 (2004) have been available in browsers for over a decade.

Chemical ITDigital Object IdentifierXMLChimicaInglese

One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards.

https://doi.org/10.59350/97msk-v5976

Pubblicato 8 settembre 2014 in Henry Rzepa's Blog

Autore Henry Rzepa

In the beginning (taken here as prior to ~1980) libraries held five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection.

BioNamesELifeELife LensXMLZooKeysScienze informatiche e dell'informazioneInglese

A new way to view taxonomic publications

https://doi.org/10.59350/dvk74-z0x30

Pubblicato 19 giugno 2013 in iPhylo

Autore Roderic Page

One of goals of BioNames is to be more than simply another taxonomic database. In particular, I'm interested in the idea of having a platform for viewing taxonomic publications. One way to think about this is to consider the experience of viewing Wikipedia. For any given page in Wikipedia there will be links to other, related content in Wikipedia.

IPadNLM DTDPLoSXMLXSLTScienze informatiche e dell'informazioneInglese

Towards an interactive taxonomic article: displaying an article from ZooKeys

https://doi.org/10.59350/666fj-2wr72

Pubblicato 19 dicembre 2011 in iPhylo

Autore Roderic Page

One of the things I keep revisiting is the way we display scientific articles. Apart from Nature's excellent iPhone and iPad apps, most efforts to re-imagine how we display articles are little more than glorified PDF viewers (e.g., the PLoS iPad app). Part of the challenge is that if we make the article more interactive we immediately confront the problem of how to link to other content.

DjVuXMLXSLTScienze informatiche e dell'informazioneInglese

DjVu XML to HTML

https://doi.org/10.59350/q4ev5-ffd09

Pubblicato 24 marzo 2010 in iPhylo

Autore Roderic Page

This post is simply a quick note on some experiments with DjVu that I haven't finished. Much of BHL's content is available as DjVu files, which contain both the scanned images and OCR text, complete with co-ordinates of each piece of text. This means that it would, in principle, be trivial to lay out the bounding boxes of each text element on a web page.

ElsevierGrandChallengeJSONXMLScienze informatiche e dell'informazioneInglese

Hell is other people's data

https://doi.org/10.59350/qtpee-n9m25

Pubblicato 3 settembre 2008 in iPhylo

Autore Roderic Page

Starting to get serious about the Grand Challenge. First step is to parse the XML data Elsevier made available. Sadly this is only for Molecular Phylogenetics and Evolution for 2007, I would have liked the whole journal in XML to avoid hassles with parsing PDF. However, XML is not without it's own problems.

XmlBioclipseCmlChimicaInglese

XML validation on Eclipse with Web Tools Platform

https://doi.org/10.59350/59vwx-e6a02

Pubblicato 24 maggio 2006 in chem-bla-ics

Autore Egon Willighagen

Yesterday I installed the Eclipse Web Tools Platform again, and now succesfully, using the Eclipse update mechanism, on my Kubuntu dapper eclipse install. Because it has a validating XML editor, the one last thing I still needed jEdit for. (I do miss the vertical selection feature of jEdit, though.) It signals me of errors, and allows autocompletion.

CmlBioclipseXmlTextminingRssChimicaInglese

Open Text Mining Interface and Bioclipse

https://doi.org/10.59350/wyet7-r6r37

Pubblicato 7 maggio 2006 in chem-bla-ics

Autore Egon Willighagen

Timo Hannay blogged in Nature’s Nascent blog about the Open Text Mining Interface (OTMI), which is “a suggestion from Nature about how we might achieve text-mining and indexing purposes”. The idea is that each article has a link pointing to a machine readable file containing raw data about (and from?) the article.

Messaggi di Rogue Scholar

pubchunks: extract parts of scholarly XML articles

Using xml schema and xslt in R

One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards.

A new way to view taxonomic publications

Towards an interactive taxonomic article: displaying an article from ZooKeys

DjVu XML to HTML

Hell is other people's data

XML validation on Eclipse with Web Tools Platform

Open Text Mining Interface and Bioclipse