Postagens de Rogue Scholar

language
Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

After 2.5 years of development, version 1.0 of the mongolite package has been released to CRAN. The package is now stable, well documented, and will soon be submitted for peer review to be onboarded in the rOpenSci suite. MongoDB in R and mongolite I started working on mongolite in September 2014, and it was first announced at the rOpenSci unconf 2015.

Publicados in rOpenSci - open tools for open science
Autor Sean Hughes

As a lab scientist, I do almost all of my experiments in microtiter plates. These tools are an efficient means of organizing many parallel experimental conditions. It’s not always easy, however, to translate between the physical plate and a useful data structure for analysis. My first attempts to solve this problem–nesting one ifelse call inside of the next to describe which well was which–were very unsatisfying.

Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

This week an update for xml2 and a new xslt package have appeared on CRAN. A full announcement for xml2 version 1.1 will appear on the rstudio blog. This post explains xml validation (via xsd schema) and xml transformation (via xslt stylesheets) which have been added in this release. XML schemas and stylesheets are not exactly new; both xslt 1.1 (2001) and xsd 1.0 (2004) have been available in browsers for over a decade.

Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

A new version of jsonlite package to CRAN. This is a maintenance release with enhancements and bug fixes. A summary of changes in v1.2 from the NEWS file: Add read_json and write_json convenience wrappers, #161 Update modp_numtoa from upstream, fixes a rounding issue in #148.

Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

This week we released version 1.0 of the ropensci pdftools package to CRAN. Pdftools provides utilities for extracting text, fonts, attachments and other data from PDF files. It also supports rendering of PDF files into bitmap images. This release has a few internal enhancements and fixes an annoying bug for landscape PDF pages. The version bump to 1.0 signifies that the package has undergone sufficient testing and the API is stable.

Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

A few weeks ago we announced the first release of the tesseract package: a high quality OCR engine in R. We have now released an update with extra features. Installing Training Data As explained in the first post, the tesseract system is powered by language specific training data. By default only English training data is installed. Version 1.3 adds utilities to make it easier to install additional training data.

Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

This week the folks at Github have open sourced their fork of libcmark (based on the extensive PR by Mathieu Duponchelle), which they use to render markdown text within documents, issues, comments and anything else on the Github website.

Publicados in rOpenSci - open tools for open science
Autor Jeroen Ooms

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form.

Publicados in rOpenSci - open tools for open science
Autor Lincoln Mullen

The R package ecosystem for natural language processing has been flourishing in recent days. R packages for text analysis have usually been based on the classes provided by the NLP or tm packages. Many of them depend on Java. But recently there have been a number of new packages for text analysis in R, most notably text2vec, quanteda, and tidytext.

Publicados in rOpenSci - open tools for open science
Autor David Winter

A new version of rentrez, our package for the NCBI’s EUtils API, is makingit’s way around the CRAN mirrors. This release represents a substantialimprovement to rentrez, including a new vignettethat documents the whole package. This posts describes some of the new things in rentrez, and gives us a chanceto thank some of the people that have contributed to this package’s development.