Rogue Scholar Beiträge

language
Veröffentlicht in The Ideophone
Autor Mark Dingemanse

A lot of our recent work revolves around working with conversational data, and one thing that’s struck me is that there are no easy ways to create compelling visualizations of conversation as it unfolds over time. The most common form seems to be pixelated screenshots of transcription software not made for this purpose.

Veröffentlicht in Lucidarios

En la entrada anterior presenté el trabajo de un misterioso profanador del testimonio C del Lucidario . Decía ahí que el comentarista anotaba pasajes que diferían o faltaban en otro testimonio del Lucidario que consultó y a menudo introducía enmiendas en pasajes donde encontraba errores. Sin embargo, su trabajo no fue perfecto, pues en algunos casos propuso correcciones erróneas. Así ocurre en el fol.

Veröffentlicht in The Ideophone
Autor Mark Dingemanse

Will synthetic text generators usher in a new age of creative thinking? The remarkable fluency of large language models may make them interesting tools for rapidly exploring semantic and stylistic spaces, yet the deceptive ease with which they generate output also provides countless new ways of appropriating ideas and erasing authorship.

Veröffentlicht in Martin Paul Eve

I have been thinking, this week, about the observability of AWS Lambda functions in API Gateway contexts. The major challenge is that Prometheus metrics pose a problem as they are pull-only (via a scraping endpoint). Prometheus metrics are stored in a temporary disk cache and then pulled off-site by Grafana etc.

Veröffentlicht in Martin Paul Eve

LocalStack is a great cloud emulation layer. It lets you simulate interaction with AWS, which is great for writing integration tests. However, I wanted a system that, when run locally, would spin up the LocalStack server and then destroy it when done. But when running the test on GitLab CI, it will use the “service” provision of their continuous integration system and connect to that.

Veröffentlicht in Lucidarios

Volviendo de la pausa de dos semanas, quiero hablar ahora de un tema que llamó mi atención cuando comencé a trabajar con los testimonios A, B y C del Lucidario : la existencia de dos anónimos profanadores –modernos– que anotaron los manuscritos a su gusto, sin importarles su antigüedad ni su valor.

Veröffentlicht in The Ideophone
Autor Mark Dingemanse

It’s easy to forget amidst a rising tide of synthetic text, but language is not actually about strings of words, and language scientists would do well not to chain themselves to models that presume so. For apt and timely commentary we turn to Bronislaw Malinowski who wrote: In follow-up work, Malinowski has critiqued the unexamined use of decontextualised strings of words as a proxy for Meaning: Malinowski did not write this on his substack,

Veröffentlicht in Martin Paul Eve

In my new role at Crossref I work on a series of data pipelines for research and development projects. These are resource-intensive data processing tasks that need to be executed periodically on a schedule, with good observability, but also with parallel processing capacity. Amazon’s Managed Workflows for Apache Airflow (MWAA) seems like an ideal solution for this.

Veröffentlicht in Lucidarios

Retomo el hilo de Transkribus para discutir algunas cuestiones que surgen tras la transcripción. En la última entrada teníamos un modelo entrenado (y re-entrenado) y listo para ser utilizado en la transcripción automatizada del testimonio D del Lucidario . No hay más que hacer que decirle a Transkribus: ahora transcríbelo todo.

Veröffentlicht in Lucidarios

En la entrada anterior comencé a hablar de los errores por sustitución, cuya revisión terminaré en esta. El sexto tipo es la sustitución de palabras o frases, cuando el escriba establece mal el corte sintáctico de lo que aparece en su modelo.