Online Learning
文章平均质量分 92
Online learning
xiwang_chn
这个作者很懒,什么都没留下…
展开
-
Online Learning 1: Introduction
Online Learning 1: Introduction1 Assumption2 ProbabilityProbability tripletExpectation and varianceIndependenceConditioning, conditional expectation3 ConcentrationReviewMarkov's inequality (non-negative random variable)Chebyshev’s inequality (arbitrary ran原创 2021-05-17 03:42:01 · 696 阅读 · 1 评论 -
Online Learning 2: Explore-then-Commit ETC and Doubling Trick
Online Learning 2: Explore-then-Commit ETC and Doubling Trick1 Setting (suboptimality gap, time horizon)2 Explore-then-commit (ETC) algorithmAlgorithm: input mmmChoose mmmRegretProofDiscussionQuestions3 Doubling TrickDoubling trickRegretProofDiscussionWhat原创 2021-05-17 03:41:48 · 676 阅读 · 0 评论 -
Online Learning 3: e-greedy and Elimination algorithm
Online Learning 3: e-greedy and Elimination algorithm1 Setting2 e-greedyAlgorithmChoose eRegretProofDisscussion1 Setting2 e-greedyAlgorithmChoose eRegretProofDisscussion原创 2021-05-17 03:41:30 · 263 阅读 · 0 评论 -
Online Learning 4: Upper Confidence Bound (UCB) algorithm
Online Learning 4: Upper Confidence Bound [UCB] algorithm1 Introduction1.1 Key features1.2 Anytime concentration1.3 UCB intuition2 UCB algorithm***The Upper Confidence Bound Algorithm (book)***2.1 Settings2.2 Algorithm2.3 Regret3 KL-UCB Bernoulli3.1 KL-div原创 2021-05-17 03:41:16 · 493 阅读 · 0 评论 -
Online Learning 5: Exp3 algorithm (adversarial setting)
Online Learning 5: Exp3 algorithm, adversarial setting1 Adversarial setting1.1 Stochastic setting1.2 Adversarial setting1.3 Regret2 Exp3 (exponential weighting algorithm for exploration and exploitation) algorithm2.1 Intuition2.2 Algorithm2.3 Regret1 Adve原创 2021-05-17 03:40:59 · 495 阅读 · 0 评论 -
Online Learning 6: Experts, Least squares
Online Learning 6: Experts, Least squares1 Setting1.1 Player, adversity, experts1.2 Adversarial setting1.3 Regret2 Exp4 (exponential weighting algorithm for exploration and exploitation with experts) algorithm2.1 Algorithm2.2 Regret1 Setting1.1 Player, a原创 2021-05-17 03:40:14 · 186 阅读 · 0 评论 -
Online Learning 7: Linear Bandits
Online Learning 7: Linear Bandits1 Setting (linear contextual bandits)1.1 Setting1.2 ModelFeature mapModel1.3 Regret2 Linear UCB2.1 Algorithm2.2 Regret1 Setting (linear contextual bandits)1.1 Setting1.2 ModelFeature mapMap the context and arm to so原创 2021-05-17 03:39:59 · 636 阅读 · 0 评论 -
Online Learning 8: Pure Explore Algorithms
Online Learning 8: Pure Explore Algorithms1 Settings1.1 Some scenarios1.2 Settings1.3 How to choose algorithms1.4 Simple Regret2 Uniform explore2.1 Algorithm2.2 RegretFixed 1-subgaussian k-arm environment εSGk(1)\varepsilon_{SG}^k(1)εSGk(1)Worst case boun原创 2021-05-17 03:39:43 · 89 阅读 · 0 评论 -
Online Learning 9: Bayesian, Thompson Sampling
Online Learning 9: Bayesian, Thompson Sampling1 Setting2 Bayesian algorithms1.1 Bayesian bandit environment1.2 Posterior1.3 Bayesian regretFrequentist / environment-dependent regretBayesian regret, Average over environments w.r.t. prior Q3 Thompson Samplin原创 2021-05-17 03:39:28 · 259 阅读 · 0 评论