Messages de Rogue Scholar

Auteur Sasha Goodman

The Apache Tika parser is like the Babel fish in Douglas Adam’s book, “The Hitchhikers’ Guide to the Galaxy” 1 . The Babel fish translates any natural language to any other. Although Tika does not yet translate natural language, it starts to tame the tower of babel of digital document formats. As the Babel fish allowed a person to understand Vogon poetry, Tika allows an analyst to extract text and objects from Microsoft Word.

Auteur Amanda Dobbyn

library(tidyverse)library(monkeylearn) This is a story (mostly) about how I started contributing to the rOpenSci package monkeylearn. I can’t promise any life flipturning upside down, but there will be a small discussion about git best practices which is almost as good 🤓. The tl;dr here is nothing novel but is something I wish I’d experienced firsthand sooner.

Auteur Rory Nolan

The general struggle Something that will make life easier in the long-run can be the most difficult thing to do today . For coders, prioritising the long term may involve an overhaul of current practice and the learning of a new skill.

Auteur Konstantinos Vantas

That’s a lot alike Data Science, isn’t it? Hydrologic Processes evolve in space and time, are extremely complex and we may never comprehend them. For this reason Hydrologists use models where their inputs and outputs are measurable variables: climatic and hydrologic data, land uses, vegetation coverage, soil type etc.

Auteur Daniel Münch

Olfactory Coding Detecting volatile chemicals and encoding these into neuronal activity is a vital task for all animals that is performed by their olfactory sensory systems. While these olfactory systems vary vastly between species regarding their numerical complexity, they are amazingly similar in their general structure.

Publié in rOpenSci - open tools for open science
Auteur Jeroen Ooms

Earlier this month we released a new version of the tesseract package to CRAN. This package provides R bindings to Google’s open source optical character recognition (OCR) engine Tesseract. Two major new features are support for HOCR and support for the upcoming Tesseract 4.hOCR output Support for HOCR output was requested by one of our users on Github.

Publié in rOpenSci - open tools for open science

The drake R package is a pipeline toolkit. It manages data science workflows, saves time, and adds more confidence to reproducibility. I hope it will impact the landscapes of reproducible research and high-performance computing, but I originally created it for different reasons. This post is the prequel to drake’s inception. There was struggle, and drake was the answer.Dissertation frustration My dissertation project was intense.

Publié in rOpenSci - open tools for open science
Auteur Sam Albers

One of the best things about learning R is that no matter your skill level, there is always someone who can benefit from your experience. Topics in R ranging from complicated machine learning approaches to calculating a mean all find their relevant audiences. This is particularly true when writing R packages.