Rogue Scholar

Veröffentlicht 21. Januar 2018

Since I first encountered The Parisian Stage , I’ve been impressed by the completeness of Beaumont Wicks’s life’s work: from 1950 through 1979 he compiled a list of every play performed in the theaters of Paris between 1800 and 1899. I’ve used it as the basis for my Digital Parisian Stage corpus, currently a one percent sample of the first volume (Wicks 1950), available in full text on GitHub.

ConferencesDigital HumanitiesFrenchLanguage ChangeSamplingSprachwissenschaften und Literaturwissenschaftenlanguages.da

Testing usage-based theories with a representative corpus of nineteenth-century French

https://doi.org/10.59350/43a02-sg626

Veröffentlicht 27. Juli 2017

Autor Angus Grieve-Smith

Slides

Natural Language GenerationSamplingSoftwareWebSprachwissenschaften und LiteraturwissenschaftenEnglisch

And we mean really every tree!

https://doi.org/10.59350/x33bh-29v33

Veröffentlicht 31. Mai 2017

Autor Angus Grieve-Smith

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn’t actually sampling from a list that included every street tree in New York City. The Parks Department’s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

SamplingScienceSprachwissenschaften und LiteraturwissenschaftenEnglisch

Is your face red?

https://doi.org/10.59350/n3c78-7yy66

Veröffentlicht 11. Dezember 2016

Autor Angus Grieve-Smith

In 1936, Literary Digest magazine made completely wrong predictions about the Presidential election. They did this because they polled based on a bad sample: driver’s licenses and subscriptions to their own magazine. Enough people who didn’t drive or subscribe to Literary Digest voted, and they voted for Roosevelt. The magazine’s editors’ faces were red, and they had the humility to put that on the cover.

Digital HumanitiesSamplingScienceSprachwissenschaften und LiteraturwissenschaftenEnglisch

Sampling is a labor-saving device

https://doi.org/10.59350/1vbdg-vrs34

Veröffentlicht 28. Oktober 2016

Autor Angus Grieve-Smith

Last month I wrote those words on a slide I was preparing to show to the American Association for Corpus Linguistics, as a part of a presentation of my Digital Parisian Stage Corpus. I was proud of having a truly representative sample of theatrical texts performed in Paris between 1800 and 1815, and thus finding a difference in the use of negation constructions that was not just large but statistically significant.

Information TechnologyNatural Language GenerationSamplingScienceWebSprachwissenschaften und LiteraturwissenschaftenEnglisch

@everytreenyc

https://doi.org/10.59350/5cnp4-vv203

Veröffentlicht 27. August 2016

Autor Angus Grieve-Smith

At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

Digital HumanitiesSamplingSprachwissenschaften und LiteraturwissenschaftenEnglisch

Sampling and the digital humanities

https://doi.org/10.59350/bct79-da506

Veröffentlicht 16. Februar 2016

Autor Angus Grieve-Smith

I was pleased to have the opportunity to announce some progress on my Digital Parisian Stage project in a lightning talk at the kickoff event for New York City Digital Humanities Week on Tuesday. One theme that was expressed by several other digital humanists that day was the sheer volume of interesting stuff being produced daily, and collected in our archives.

SamplingScienceSprachwissenschaften und LiteraturwissenschaftenEnglisch

Why I probably won’t take your survey

https://doi.org/10.59350/ekh79-1jn98

Veröffentlicht 7. Februar 2014

Autor Angus Grieve-Smith

I wrote recently that if you want to be confident in generalizing observations from a sample to the entire population, your sample needs to be representative. But maybe you’re skeptical. You might have noticed that a lot of people don’t pay much attention to representativeness, and somehow there are hardly any consequences for them. But that doesn’t mean that there are never consequences, for them or other people.

SamplingScienceSprachwissenschaften und LiteraturwissenschaftenEnglisch

You can’t get significance without a representative sample

https://doi.org/10.59350/6hb0p-pcr26

Veröffentlicht 28. Januar 2014

Autor Angus Grieve-Smith

Recently I’ve talked about the different standards for existential and universal claims, how we can use representative samples to estimate universal claims, and how we know if our representative sample is big enough to be “statistically significant.” But I want to add a word of caution to these tests: you can’t get statistical significance without a representative sample.

SamplingScienceSprachwissenschaften und LiteraturwissenschaftenEnglisch

How big a sample do you need?

https://doi.org/10.59350/dn0ne-prg88

Veröffentlicht 19. Januar 2014

Autor Angus Grieve-Smith

In my post last week I talked about the importance of representative samples for making universal statements, including averages and percentages. But how big should your sample be? You don’t need to look at everything, but you probably need to look at more than one thing. How big a sample do you need in order to be reasonably sure of your estimates?

Technology and language

On this day in Parisian theater

Testing usage-based theories with a representative corpus of nineteenth-century French

And we mean really every tree!

Is your face red?

Sampling is a labor-saving device

@everytreenyc

Sampling and the digital humanities

Why I probably won’t take your survey

You can’t get significance without a representative sample

How big a sample do you need?