Messaggi di Rogue Scholar

language
MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

There’s a well-understood phenomenon in machine learning called overfitting. The idea is best shown by a graph: overfitting Let me explain. The vertical axis represents the error of a hypothesis. The horizontal axis represents the complexity of the hypothesis. The blue curve represents the error of a machine learning algorithm’s output on its training data, and the red curve represents the generalization of that hypothesis to the real world.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

In this post we’ll implement Reed-Solomon error-correcting codes and use them to play with codes. In our last post we defined Reed-Solomon codes rigorously, but in this post we’ll focus on intuition and code. As usual the code and data used in this post is available on this blog’s Github page.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

In 2014 the White House commissioned a 90-day study that culminated in a report (pdf) on the state of “big data” and related technologies. The authors give many recommendations, including this central warning. Warning: algorithms can facilitate illegal discrimination! Here’s a not-so-imaginary example of the problem. A bank wants people to take loans with high interest rates, and it also serves ads for these loans.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

A while back we featured a post about why learning mathematics can be hard for programmers, and I claimed a major issue was not understanding the basic methods of proof (the lingua franca between intuition and rigorous mathematics). I boiled these down to the “basic four,” direct implication, contrapositive, contradiction, and induction. But in mathematics there is an ever growing supply of proof methods.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

A while back Peter Norvig posted a wonderful pair of articles about regex golf. The idea behind regex golf is to come up with the shortest possible regular expression that matches one given list of strings, but not the other. “Regex Golf,” by Randall Munroe. In the first article, Norvig runs a basic algorithm to recreate and improve the results from the comic, and in the second he beefs it up with some improved search heuristics.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics. Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

Last time we defined the Hamming code. We also saw that it meets the Hamming bound, which is a measure of how densely a code can be packed inside an ambient space and still maintain a given distance. This time we’ll define the Reed-Solomon code which optimizes a different bound called the Singleton bound, and then generalize them to a larger class of codes called Reed-Muller codes.

MatematicaInglese
Pubblicato in Math ∩ Programming
Autore Jeremy Kun

Problem: Given a massive data stream of $ n$ values in $ \ 1, 2, \dots, m \$ and the guarantee that one value occurs more than $ n/2$ times in the stream, determine exactly which value does so. Solution: (in Python) def majority(stream): held = next(stream) counter = 1 for item in stream: if item == held: counter += 1 elif counter == 0: held = item counter = 1 else: counter -= 1 return held Discussion: Let’s prove correctness.