Rogue Scholar

Pubblicato 21 settembre 2015 in Math ∩ Programming

Autore Jeremy Kun

There’s a well-understood phenomenon in machine learning called overfitting. The idea is best shown by a graph: overfitting Let me explain. The vertical axis represents the error of a hypothesis. The horizontal axis represents the complexity of the hypothesis. The blue curve represents the error of a machine learning algorithm’s output on its training data, and the red curve represents the generalization of that hypothesis to the real world.

MatematicaInglese

The Welch-Berlekamp Algorithm for Correcting Errors in Data

https://doi.org/10.59350/nbs03-gnh20

Pubblicato 7 settembre 2015 in Math ∩ Programming

Autore Jeremy Kun

In this post we’ll implement Reed-Solomon error-correcting codes and use them to play with codes. In our last post we defined Reed-Solomon codes rigorously, but in this post we’ll focus on intuition and code. As usual the code and data used in this post is available on this blog’s Github page.

MatematicaInglese

The Čech Complex and the Vietoris-Rips Complex

https://doi.org/10.59350/x6g7z-v8j53

Pubblicato 6 agosto 2015 in Math ∩ Programming

Autore Jeremy Kun

It’s about time we got back to computational topology. Previously in this series we endured a lightning tour of the fundamental group and homology, then we saw how to compute the homology of a simplicial complex using linear algebra. What we really want to do is talk about the inherent shape of data.

MatematicaInglese

What does it mean for an algorithm to be fair?

https://doi.org/10.59350/hd748-25k50

Pubblicato 13 luglio 2015 in Math ∩ Programming

Autore Jeremy Kun

In 2014 the White House commissioned a 90-day study that culminated in a report (pdf) on the state of “big data” and related technologies. The authors give many recommendations, including this central warning. Warning: algorithms can facilitate illegal discrimination! Here’s a not-so-imaginary example of the problem. A bank wants people to take loans with high interest rates, and it also serves ads for these loans.

MatematicaInglese

Methods of Proof — Diagonalization

https://doi.org/10.59350/6r5gy-d0n31

Pubblicato 8 giugno 2015 in Math ∩ Programming

Autore Jeremy Kun

A while back we featured a post about why learning mathematics can be hard for programmers, and I claimed a major issue was not understanding the basic methods of proof (the lingua franca between intuition and rigorous mathematics). I boiled these down to the “basic four,” direct implication, contrapositive, contradiction, and induction. But in mathematics there is an ever growing supply of proof methods.

MatematicaInglese

Weak Learning, Boosting, and the AdaBoost algorithm

https://doi.org/10.59350/c8nrk-jr359

Pubblicato 18 maggio 2015 in Math ∩ Programming

Autore Jeremy Kun

When addressing the question of what it means for an algorithm to learn, one can imagine many different models, and there are quite a few. This invariably raises the question of which models are “the same” and which are “different,” along with a precise description of how we’re comparing models.

MatematicaInglese

The Many Faces of Set Cover

https://doi.org/10.59350/e539d-7gf86

Pubblicato 4 maggio 2015 in Math ∩ Programming

Autore Jeremy Kun

A while back Peter Norvig posted a wonderful pair of articles about regex golf. The idea behind regex golf is to come up with the shortest possible regular expression that matches one given list of strings, but not the other. “Regex Golf,” by Randall Munroe. In the first article, Norvig runs a basic algorithm to recreate and improve the results from the comic, and in the second he beefs it up with some improved search heuristics.

MatematicaInglese

Markov Chain Monte Carlo Without all the Bullshit

https://doi.org/10.59350/7b0tq-90902

Pubblicato 6 aprile 2015 in Math ∩ Programming

Autore Jeremy Kun

I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics. Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model.

MatematicaInglese

The Codes of Solomon, Reed, and Muller

https://doi.org/10.59350/ffrag-zsz61

Pubblicato 23 marzo 2015 in Math ∩ Programming

Autore Jeremy Kun

Last time we defined the Hamming code. We also saw that it meets the Hamming bound, which is a measure of how densely a code can be packed inside an ambient space and still maintain a given distance. This time we’ll define the Reed-Solomon code which optimizes a different bound called the Singleton bound, and then generalize them to a larger class of codes called Reed-Muller codes.

MatematicaInglese

Finding the majority element of a stream

https://doi.org/10.59350/pmzmt-2v706

Pubblicato 9 marzo 2015 in Math ∩ Programming

Autore Jeremy Kun

Problem: Given a massive data stream of $ n$ values in $ \ 1, 2, \dots, m \$ and the guarantee that one value occurs more than $ n/2$ times in the stream, determine exactly which value does so. Solution: (in Python) def majority(stream): held = next(stream) counter = 1 for item in stream: if item == held: counter += 1 elif counter == 0: held = item counter = 1 else: counter -= 1 return held Discussion: Let’s prove correctness.

Messaggi di Rogue Scholar

The Boosting Margin, or Why Boosting Doesn't Overfit

The Welch-Berlekamp Algorithm for Correcting Errors in Data

The Čech Complex and the Vietoris-Rips Complex

What does it mean for an algorithm to be fair?

Methods of Proof — Diagonalization

Weak Learning, Boosting, and the AdaBoost algorithm

The Many Faces of Set Cover

Markov Chain Monte Carlo Without all the Bullshit

The Codes of Solomon, Reed, and Muller

Finding the majority element of a stream