Rogue Scholar Beiträge

language
Veröffentlicht in Donny Winston

One of my favorite features of the PyCharm code editor is go-to-declaration: you can hold the control key and hover your mouse over a usage of a symbol, and you’ll see a tooltip with a preview of the declaration/definition of the symbol. Click it, and you’ll jump to the definition, perhaps in another file. After you’ve reviewed the definition, a keyboard shortcut gets you back to the usage point.

Veröffentlicht in Donny Winston

The RDF data model is quite flexible: Anybody can say Anything about Any topic (aka the “AAA slogan”). However, I recommend – and describe here – a particular modeling strategy when it comes to entering new facts about research activities into a data management system. Once entered this way, workflows may add additional derived facts to suit the needs of downstream applications.

Veröffentlicht in Donny Winston

Have you ever given or gotten data as CSV? Are the meanings of the columns always clear? How are they made clear? Are the given column labels/names and the given file/sheet names always enough? If additional information beyond the CSV file is needed, how is that facilitated? A separate README file that travels with the CSV as part of a zipped archive file?

Veröffentlicht in Donny Winston

If you provide JSON, either as files or as API responses, you might be one step away from ensuring that anyone encountering that JSON has a portal to what it means. This step is to provide a single extra key-value pair in each JSON document – the key is “@context”, and the value is a URL. JSON-LD is “a JSON-based format to serialize Linked Data.

Veröffentlicht in Donny Winston

You have a variety of entities, each with a variety of attributes, and each involved in a variety of relationships. One approach to manage such data is a collection/spreadsheet/table approach where you partition your entities. Each entity has a primary address in a document/row in one collection/sheet/table.

Veröffentlicht in Donny Winston

If you write a program that references a variable, and that variable points to a value, you likely don’t want that value to change unless you’re doing the changing. This gets tricky when you want to bring in more resources to help you get the job done faster – you might still be in control of the program, but the “you” in action may be multiple cores/threads.

Veröffentlicht in Donny Winston

How do you check potential changes to your published data? You might set a rhythm for releases, say monthly or quarterly. You generate a new release candidate, run a set of checks, and then release. You reproduce the whole thing rather than add to it, and you do your checks at the end. You might do incremental rather than batch processing. You apply changes to your last release, run checks, and then release.

Veröffentlicht in Donny Winston

Git is the common tool for version control of code. How does it work? It works by grouping events about lines. Lines are added or removed, and a group of add/remove events is a transaction, i.e. a “commit”. A sequence of these line-delta-group events can be replayed from the log to construct a snapshot of the codebase at any point in the commit history.

Veröffentlicht in Donny Winston

We often think of provenance as a physical thing, tracking the history of a sample and of what we measured. But the provenance of a result started when someone had the idea or the request to measure it. The metadata for a result is not just the parameters on the instrument, or how much sample, or which sample – it’s all those other steps upstream. Conceptual metadata are like tags, meaningful handles.