Rogue Scholar

Pubblicato 8 novembre 2013 in Math ∩ Programming

Autore Jeremy Kun

In the last twenty years there has been a lot of research in a subfield of machine learning called Bandit Learning. The name comes from the problem of being faced with a large sequence of slot machines (once called one-armed bandits) each with a potentially different payout scheme.

Adversarial Bandits and the Exp3 Algorithm