机器学习中遇到的概率论问题
1、均匀分布的均值及方差
连续变量x 的均匀分布定义为
U ( x ∣ a , b ) = { 1 b − a , x ∈ [ a , b ] , 0 , x ∉ [ a , b ] U(x|a,b) = \begin{cases} \frac{1}{b-a}, & x \in[a,b], \\[2ex] 0, & x \notin [a,b] \end{cases} U(x∣a,b)=⎩⎨⎧b−a1,0,x∈[a,b],x∈/[a,b]
求其均值和方差。
1.1 解答
E [ x ] = ∫ − ∞ ∞ x f ( x ∣ a , b ) d x = 1 b − a ∫ a b x d x = a + b 2 E[x]=\int_{-\infty}^{\infty }xf(x|a,b)\textrm{d}x=\frac{1 }{b-a}\int_{a}^{b}x\textrm{d}x=\frac{a+b}{2} E[x]=∫−∞∞xf(x∣a,b)dx=b−a1∫abxdx=2a+b
E [ x 2 ] = ∫ − ∞ ∞ x 2 f ( x ∣ a , b ) d x = 1 b − a ∫ a b x 2 d x = a 2 + a b + b 2 3 E[x^{2}]=\int_{-\infty}^{\infty }x^{2}f(x|a,b)\textrm{d}x=\frac{1 }{b-a}\int_{a}^{b}x^{2}\textrm{d}x=\frac{a^{2}+ab+b^{2}}{3} E[x2]=∫−∞∞x2f(x∣a,b)dx=b−a1∫abx2dx=3a2+ab+b2
v a r [ x ] = E [ x 2 ] − E [ x ] 2 = a 2 + a b + b 2 3 − ( a + b 2 ) 2 = ( b − a ) 2 12 \mathrm{var}[x]=E[x^{2}]-E[x]^{2}=\frac{a^2+ab+b^2}{3}-(\frac{a+b}{2})^{2}=\frac{(b-a)^2}{12} var[x]=E[x2]−E[x]2=3a2+ab+b2−(2a+b)2=12(b−a)2
2、β分布证归一性,求期望与方差
贝塔分布是定义在 ( 0 , 1 ) (0, 1) (0,1) 区间上含两个参数的⼀类连续分布,参数为 ( α , β ) (\alpha, \beta) (α,β) 的贝塔概率密度函数为
f ( x ∣ α , β ) = Γ ( α + β ) Γ ( α ) Γ ( β ) x α − 1 ( 1 − x ) β − 1 , f(x|\alpha,\beta)=\frac{ \Gamma (\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}, f(x∣α,β)=Γ(α)Γ(β)Γ(α+β)xα−1(1−x)β−1,
试证
(a)归一性
∫ 0 1 f ( x ∣ α , β ) d x = 1 \int_{0}^{1}f(x|\alpha,\beta)\mathrm{d}x=1 ∫01f(x∣α,β)dx=1
(b)期望
E [ x ] = α α + β E[x]=\frac{\alpha}{\alpha+\beta} E[x]=α+βα
( c ) (c) (c) ⽅差
v a r [ x ] = α β ( α + β ) 2 ( α + β + 1 ) \mathrm{var}[x]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} var[x]=(α+β)2(α+β+1)αβ
2.1 解答
(a) 贝塔函数 B ( α , β ) = ∫ 0 1 x α − 1 ( 1 − x ) β − 1 d x B(\alpha ,\beta )=\int_{0}^{1}x^{\alpha -1}(1-x)^{\beta -1}\mathrm{d}x B(α,β)=∫01xα−1(1−x)β−1dx与伽马函数 Γ ( α ) = ∫ 0 ∞ e − x x α − 1 d x \Gamma \left ( \alpha \right )=\int_{0}^{\infty }e^{-x}x^{\alpha -1}\mathrm{\mathrm{d}}x Γ(α)=∫0∞e−xxα−1dx的关系为
B ( α + β ) = Γ ( α ) Γ ( β ) Γ ( α + β ) B\left ( \alpha +\beta \right )=\frac{\Gamma (\alpha )\Gamma (\beta )}{\Gamma (\alpha +\beta )} B(α+β)=Γ(α+β)Γ(α)Γ(β)
由此易证归⼀性。另外,贝塔概率密度函数可表⽰为
f ( x ∣ α , β ) = 1 B ( α , β ) x α − 1 ( 1 − x ) β − 1 f(x|\alpha ,\beta )=\frac{1}{B(\alpha ,\beta )}x^{\alpha -1}(1-x)^{\beta -1} f(x∣α,β)=B(α,β)1xα−1(1−x)β−1
(b) 贝塔分布的 n n n 阶矩为
E [ x n ] = 1 B ( α , β ) ∫ 0 1 x n x α − 1 ( 1 − x ) β − 1 d x = B ( n + α , β ) B ( α , β ) = Γ ( n + α ) Γ ( β ) Γ ( n + α + β ) Γ ( α + β ) Γ ( α ) Γ ( β ) = Γ ( n + α ) Γ ( n + α + β ) Γ ( α + β ) Γ ( α ) \begin{aligned} E[x^{n}]&=\frac{1}{B(\alpha ,\beta )}\int_{0}^{1}x^{n}x^{\alpha -1}(1-x)^{\beta -1}\mathrm{d}x=\frac{B(n+\alpha ,\beta )}{B(\alpha ,\beta )} \\ &= \frac{\Gamma (n+\alpha )\Gamma (\beta )}{\Gamma (n+\alpha +\beta )}\frac{\Gamma (\alpha +\beta )}{\Gamma (\alpha )\Gamma (\beta )}=\frac{\Gamma (n+\alpha )}{\Gamma (n+\alpha +\beta )}\frac{\Gamma (\alpha +\beta )}{\Gamma (\alpha )} \end{aligned} E[xn]=B(α,β)1∫01xnxα−1(1−x)β−1dx=B(α,β)B(n+α,β)=Γ(n+α+β)Γ(n+α)Γ(β)Γ(α)Γ(β)Γ(α+β)=Γ(n+α+β)Γ(n+α)Γ(α)Γ(α+β)
令 n = 1 , 2 n = 1, 2 n=1,2,并结合伽马函数的性质 Γ ( α + 1 ) = α Γ ( α ) \Gamma(\alpha + 1) = \alpha\Gamma(\alpha) Γ(α+1)=αΓ(α) 可得
E [ x ] = Γ ( 1 + α ) Γ ( 1 + α + β ) Γ ( α + β ) Γ ( α ) = α α + β E[x]=\frac{\Gamma (1+\alpha )}{\Gamma (1+\alpha +\beta )}\frac{\Gamma(\alpha +\beta )}{\Gamma (\alpha )}=\frac{\alpha }{\alpha +\beta } E[x]=Γ(1+α+β)Γ(1+α)Γ(α)Γ(α+β)=α+βα
E [ x 2 ] = Γ ( 2 + α ) Γ ( 2 + α + β ) Γ ( α + β ) Γ ( α ) = α ( α + 1 ) ( α + β ) ( α + β + 1 ) E[x^2]=\frac{\Gamma (2+\alpha )}{\Gamma (2+\alpha +\beta )}\frac{\Gamma(\alpha +\beta )}{\Gamma (\alpha )}=\frac{\alpha(\alpha +1) }{(\alpha +\beta)(\alpha +\beta +1) } E[x2]=Γ(2+α+β)Γ(2+α)Γ(α)Γ(α+β)=(α+β)(α+β+1)α(α+1)
( c ) (c) (c) 贝塔分布的⽅差为
v a r [ x ] = E [ x 2 ] − E [ x ] 2 = α ( α + 1 ) ( α + β ) ( α + β + 1 ) − α 2 ( α + β ) 2 = α β ( α + β ) 2 ( α + β + 1 ) \mathrm{var}[x]=E[x^2]-E[x]^2=\frac{\alpha(\alpha +1) }{(\alpha +\beta)(\alpha +\beta +1) }-\frac{\alpha ^{2}}{(\alpha +\beta )^2}=\frac{\alpha \beta }{(\alpha +\beta )^2(\alpha +\beta +1)} var[x]=E[x2]−E[x]2=(α+β)(α+β+1)α(α+1)−(α+β)2α2=(α+β)2(α+β+1)αβ
3、多变量正态分布证协方差的最大似然解及其期望
设 X ∼ N ( μ , Σ ) , μ ∈ R D , Σ ∈ R D ∗ D \mathbf{X} \sim N(\mathbf{\mu}, \mathbf{\Sigma)}, \mathbf{\mu}\in R^D, \mathbf{\Sigma}\in R^{D*D} X∼N(μ,Σ),μ∈RD,Σ∈RD∗D(多变量正态分布),即
p ( x ∣ μ , Σ ) = N ( x ∣ μ , Σ ) = 1 ( 2 π ) D 2 ∣ Σ ∣ 1 2 e x p − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) p(x|\mathbf{\mu} ,\mathbf{\Sigma} )=N(x|\mathbf{\mu ,\mathbf{\Sigma}} )=\frac{1}{(2\pi )^{\frac{D}{2}}\left | \mathbf{\Sigma} \right |^{\frac{1}{2}}}exp{-\frac{1}{2}(x-\mu)^T\mathbf{\Sigma} ^{-1}(x-\mathbf{\mu})} p(x∣μ,Σ)=N(x∣μ,Σ)=(2π)2D∣Σ∣211exp−21(x−μ)TΣ−1(x−μ)
设数据集 D = ( x 1 , . . . , x N ) D = (x_{1}, ... , x_N) D=(x1,...,xN)是独⽴地从 N ( μ , Σ ) N(\mathbf{\mu}, \mathbf{\Sigma}) N(μ,Σ) 中抽样⽽得到的。证明
(a) 期望的最⼤似然估计为
μ M L = 1 N ∑ n = 1 N x n \mu_{ML}=\frac{1}{N}\sum_{n=1}^{N}x_n μML=N1n=1∑Nxn
(b) 协⽅差的最⼤似然解为