Online Learning 1: Introduction
1 Assumption
2 Probability
Probability triplet
Expectation and variance
Independence
Conditioning, conditional expectation
3 Concentration
Review
Markov’s inequality (non-negative random variable)
Chebyshev’s inequality (arbitrary random variable)
Chernoff bound
Gaussian and sub-gaussian
4 Bandit Framework and Regret
Notation
Stochastic bandit
Unstructured, structured environment
- Unstructured environment: Play each arm a reasonable number of times to estimate the goodness of that arm.
- Structured environment: Infinite actions. Different actions or different arms leak information about each other, need to only play about order d amount of times and basically get samples to figure out what the theta is. In some sense easier in terms of the number of samples needed.
Regret
Suboptimality
Suboptimality quantifies how much was any particular arm is in an expected sense from the best arm.
Regret decomposition
Bayesian regret