Rogue Scholar Posts

language
Published in The Ideophone
Author Mark Dingemanse

Language makes us human. But there is an interesting asymmetry in our willingness to ascribe linguistic capacities to non-humans: animals are seen as having none, whereas computers may well master it according to many. What curious conception of language makes this asymmetry possible? And what do Descartes and Turing have to do with it? Notes from a new essay about language between animals and computers.

Published in The Ideophone
Author Mark Dingemanse

Interjections are, in Felix Ameka’s memorable formulation, “the universal yet neglected part of speech” (1992). They are rarely the subject of historical, typological or comparative research in linguistics, and as Aimée Lahaussois has shown (2016), they are notably underrepresented in descriptive grammars. As grammars are the main source of data for typologists, this is of course a perfect example of a self-reinforcing feedback loop.

Published in The Ideophone
Author Mark Dingemanse

There is a minor industry in speech science and NLP devoted to detecting and removing disfluencies. In some of our recent work we’re showing that treating talk as sanitised text can adversely impact voice user interfaces. However, this is still a minority position. Googlers Dan Walker and Dan Liebling represent the mainstream view well in this blog post: Fair enough, you might say.

Published in The Ideophone
Author Mark Dingemanse

This is a the second part in a two part series of peer commentary on a recent preprint. The first part is here. I ended that post by noting I wasn’t sure all preprint authors were aware of the public nature of the preprint. I am now assured they are, and have heard from the senior author that they are working on a revised version.

Published in The Ideophone
Author Mark Dingemanse

Clark & Fischer propose that people see social robots as interactive depictions and that this explains some aspects of people’s behaviour towards them. We agree with C&F’s conclusion that we don’t need a novel ontological category for these social artefacts and that they can be seen as intersecting with a lineage of depictions from Michelangelo’s David to Mattel’s talking barbie doll. We have two constructive contributions to make.

Published in The Ideophone
Author Mark Dingemanse

It’s easy to forget amidst a rising tide of synthetic text, but language is not actually about strings of words, and language scientists would do well not to chain themselves to models that presume so. For apt and timely commentary we turn to Bronislaw Malinowski who wrote: In follow-up work, Malinowski has critiqued the unexamined use of decontextualised strings of words as a proxy for Meaning: Malinowski did not write this on his substack,

Published in The Ideophone
Author Mark Dingemanse

A preprint claims that “ideas from theoretical linguistics have played no role in [NLP]”. Outside the confines of Chomskyan linguistics folks have long been working on incorporating storage, retrieval, gating and attention in theories of language, with direct relevance to computational models. The only way to give any content to the claim is by giving the notion “theoretical linguistics” the narrowest conceivable reading.

Published in rOpenSci - open tools for open science

Science craft As a field linguist, I have spent a lot of time working in villages in the Caucasus, collecting audio from speakers of indigenous languages. The processing of such data involves a lot of time-consuming tasks, so during my field trips I created my own pipeline for data collection.

Published in The Ideophone
Author Mark Dingemanse

Large language models make it entirely trivial to generate endless amounts of seemingly plausible text. There’s no need to be cynical to see the virtual inevitability of unending waves of algorithmically tuned AI-generated uninformation: the market forces are in place and they will be relentless.

Published in The Ideophone
Author Mark Dingemanse

Wikidata is an ambitious enterprise, but social ontologies are never language-agnostic — so the project risks perpetuating rather than transcending the worldviews most prevalent in current Wikipedia databases, which means broadly speaking global north, Anglo, western, white cishet male worldviews. I think Wikidata is perhaps promising for brute physical facts like the periodic table and biochemistry.