Messaggi di Rogue Scholar

language
Pubblicato in Stories by Research Graph on Medium
Autore Amanda Kau

Improving the performance and application of Large Language Models Author Amanda Kau (ORCID: 0009-0004-4949-9284) Large language models (LLMs) like GPT-4, the engine of products like ChatGPT, have taken centre stage in recent years due to their astonishing capabilities. Yet, they are far from perfect.

Pubblicato in Stories by Research Graph on Medium

The AI Helper Turning Mountains of Data into Bite-Sized Instructions Author Aland Astudillo (ORCID: 0009-0008-8672-3168) LLMs have been changing the way the entire world deals with problems and day-by-day tasks. To make them better for specific applications, they need huge amounts of data and complex and expensive approaches to training them.

Pubblicato in The Ideophone
Autore Mark Dingemanse

There is a minor industry in speech science and NLP devoted to detecting and removing disfluencies. In some of our recent work we’re showing that treating talk as sanitised text can adversely impact voice user interfaces. However, this is still a minority position. Googlers Dan Walker and Dan Liebling represent the mainstream view well in this blog post: Fair enough, you might say.

Pubblicato in The Ideophone
Autore Mark Dingemanse

It’s easy to forget amidst a rising tide of synthetic text, but language is not actually about strings of words, and language scientists would do well not to chain themselves to models that presume so. For apt and timely commentary we turn to Bronislaw Malinowski who wrote: In follow-up work, Malinowski has critiqued the unexamined use of decontextualised strings of words as a proxy for Meaning: Malinowski did not write this on his substack,

Pubblicato in rOpenSci - open tools for open science
Autore Amanda Dobbyn

library(tidyverse)library(monkeylearn) This is a story (mostly) about how I started contributing to the rOpenSci package monkeylearn. I can’t promise any life flipturning upside down, but there will be a small discussion about git best practices which is almost as good 🤓. The tl;dr here is nothing novel but is something I wish I’d experienced firsthand sooner.