Rogue Scholar Beiträge

language
Veröffentlicht in Donny Winston

A scientific database cannot be everything to everyone. Jim Gray came up with the “20 queries” heuristic. What are the 20 most important questions the researchers want the data system to answer? 1 Five questions are not enough to see a broader pattern, and 100 questions would dilute focus. Also, the relative information in queries ranked by importance is likely to be logarithmic – a “long tail” distribution.

Veröffentlicht in Donny Winston

Organizational capabilities can be divided into three categories: resources, processes, and priorities. Resources are what you use to achieve an outcome, processes are how you achieve it, and priorities are why . Understanding capabilities in this way can aid in strategy not only across a large organization but also within units, and even for individuals. 1 Resources are tangible assets.

Veröffentlicht in Donny Winston

How do you source data relevant for some analysis? Once you “have” the data, how do you feed it to the analytic task? Traditional enterprise data integration joins paths across a handful of silos for a handful of specific analytic tasks. In data science, however, neither the set of relevant silos nor the set of relevant analytic tasks are both small and well-defined.

Veröffentlicht in Konrad Hinsen's blog

Dear software engineers, Many of you were horrified at the sight of the C++ code that Neil Ferguson and his team wrote to simulate the spread of epidemics. I feel with you. The only reason why I am less horrified than you is that I have seen a lot of similar-looking code before. It is in fact quite common in scientific computing, in particular in research projects that have been running for many years.

Veröffentlicht in Donny Winston

I was reminded of the importance of approachable, low-barrier-to-entry tools for data management by Monica Granados and Lily Zhao in their presentation of the Frictionless Data toolkit. 1 They showcased use of a browser-based interface 2 for a simple yet valuable task: associating title and description metadata with potentially cryptic column header names in a CSV file, and exporting that metadata together with the raw data

Veröffentlicht in Donny Winston

Laws are rules that a particular community recognizes as regulating the actions of its members. From this definition, Serena Peruzzo detailed how she sought to use tools from Natural Language Processing (NLP) to “find a representation of the rules that makes them more accessible and understandable.” 1 One proposed use case is to identify and highlight ambiguities.

Veröffentlicht in Donny Winston

In an episode of the CoRecursive podcast 1 , Sam Ritchie uses the phrase “portal abstraction” to describe how the use of a particular term can open a portal – a gateway – to a world of relevant prior art. He discusses issues in analytics. One issue is distributing summative calculations over data both as batches and in real-time, specialized for “big” and “fast” data, respectively.

Veröffentlicht in Donny Winston

As part of her introduction to ontology enginering, 1 Prof. Maria Keet has a slide depicting ontology as a layer apart from conceptual data models: Conceptual data models vs. ontologies. [source] I like this visualization of various project-specific conceptual models and their associated implementations in databases and codebases.