Machine Learning Theory - L1 Concentration Inequalities

最新推荐文章于 2024-05-31 13:21:26 发布

qq_36016822

最新推荐文章于 2024-05-31 13:21:26 发布

阅读量127

点赞数

分类专栏： Machine Learning Theory 文章标签：机器学习概率论

本文链接：https://blog.csdn.net/qq_36016822/article/details/115007453

版权

Machine Learning Theory - L1 Concentration Inequalities

This is the first lecture of course Machine Learning Theory (AI603). This lecture introduced some basic but useful concentration inequalities.

What are concentration inequalities?

Concentration inequalities furnish us bounds on how random variables deviate from a value (typically, expected value) or help us to understand how well they are concentrated.

—Introduction to Concentration Inequalities

Why we need?

A classification algorithm has an error probability ϵ, i.e., the output of the algorithm misclassifies a randomly sampled data with probability ϵ. What is the probability of the event that the algorithm misclassifies more than 200ϵ data points among 100 data points.

文章目录

Machine Learning Theory - L1 Concentration Inequalities

1 Markov Inequality

Theorem: If $X$ is a non-negative random variable,
$\mathbb P(X\ge a) \le\frac{\mathbb E[X]} a, \forall a>0$

Proof:

Let $f$ be the probability density function (PDF) of $X$ .
$\begin{aligned}{} \mathbb E[X]&=\int^\infty_0xf(x)dx\\ &\ge \int^\infty_a xf(x)dx \\&\ge a\int^\infty_a f(x)dx\\ &=a\mathbb P(X\ge a) \end {aligned}$
By rearranging the terms,
$\mathbb P(X\ge a)\le\frac{\mathbb E[X]}a$

A simple trick to make positive random variables:

A non-negative strictly increasing function $\phi (x)=e^{\theta x}$

Then, with Markov Inequality we have Chernoff bound:
$\mathbb P(X\ge a)=\mathbb P(e^X\ge e^a)=\mathbb P(e^{\theta X}\ge e^{\theta a})\\ \le \frac{\mathbb E[e^{\theta X}]}{e^{\theta a}}~~~~\forall a,\theta>0$
When $X$ is the sum of independent random variables $X_1,\dots,X_n$ ,
$\mathbb P(X\ge a)\le\inf_{\theta>0}\frac{\mathbb E[e^{\theta X}]}{e^{\theta a}}=\inf_{\theta>0}\frac{\prod^n_{i=1}\mathbb E[e^{\theta X_i}]}{e^{\theta a}}~~~~\forall a>0$

2 Chernoff-Hoeffding

Theorem: Let $X_1,\dots,X_n$ be i.i.d. Bernoulli random variables $f_X(x)=p^x(1-p)^{1-x}=\left \{\begin{aligned}[l]\ p~~~~\text{if }x=1,\\q~~~~ \text{if }x=0.\end{aligned} \right.$ with probability $p$ . When $0<p\le q$ ,
$\mathbb P(\sum^n_{i=1}X_i\ge nq)\le e^{-nD(q||p)},$
where $D(q||p)=q\log \frac p q +(1-q)\log\frac{1-q}{1-p}$ is the KL-divergence. When $0<q\le p$ ,
$\mathbb P(\sum^n_{i=1}X_i\le nq)\le e^{-nD(q||p)}.$

Proof:

From the Markov inequality, for all $\theta>0$
$\begin{aligned} \mathbb P(\sum^n_{i=1}X_i\ge nq)&\le\frac{\mathbb E[e^{\theta\sum^n_{i=1}X_i}]}{e^{\theta nq}}\\ &=\frac{\prod^n_{i=1}\mathbb E[e^{\theta X_i}]}{e^{\theta nq}}\\ &=\frac{\prod^n_{i=1}(pe^\theta+1-p)}{e^{\theta nq}}.\\ &=\exp(-\theta nq)(pe^{\theta}+1-p)^n \end{aligned}$
To find the minimization of the above equality, let $\phi(\theta)=\ln (\exp(-\theta nq)(pe^{\theta}+1-p)^n)$ .
$\begin{aligned} \phi(\theta)=-\theta nq+n\ln(pe^\theta +1-p)\\ \phi'(\theta)=-nq+\frac{npe^\theta}{pe^\theta +1-p}=0\\ e^\theta=\frac{q^2} {p^2}=\frac{q(1-p)} {p(1-q)} \end{aligned}$
Thus, with $\theta=\log(\frac{q(1-p)}{p(1-q)})$ ,
$\mathbb P(\sum^n_{i=1}X_i\ge nq)\le e^{-nD(q||p)}.$

3 Hoeffding’s inequality

Lemma: Let $X\in[a,b]$ be a random variable. Then, for all $\theta>0$ ,
$\mathbb E[e^{\theta(X-\mathbb E[X])}]\le\exp(\frac{\theta^2(b-a)^2}8).$
Theorem: Let $X_1,\dots,X_n$ be independent random variables. When $X_i$ is bounded by $a_i,b_i]$ for all $i$ ,
$\mathbb P(\frac 1 n\sum^n_{t=1}(X_i-\mathbb E[X_i])\ge t)\le\exp(-\frac{2n^2t^2}{\sum^n_{i=1}(b_i-a_i)^2}).$

Proof:

Let $Y=X-\mathbb E[X]$

最低0.47元/天解锁文章

qq_36016822

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning Theory - L1 Concentration Inequalities

Machine Learning Theory - L1 Concentration InequalitiesThis is the first lecture of course Machine Learning Theory (AI603). This lecture introduced some basic but useful concentration inequalities.What are concentration inequalities?Concentration ineq
复制链接

扫一扫