The proof of “chi-square statistics follows chi-square distribution”

chi-square test(principle used in C4.5’s CVP Pruning),
also called chi-square statistics,
also called chi-square goodness-of fit

here is the contingency table:
在这里插入图片描述

The target is to prove:
∑ i = 1 i = r ∑ j = 1 j = s [ X i j − N i ⋅ ( N ⋅ j n ) ] 2 N i ⋅ ( N j n ) ∼ χ 2 [ ( r − 1 ) ( s − 1 ) ] ① \sum_{i=1}^{i=r} \sum_{j=1}^{j=s}\frac{[X_{ij}-N_{i·}(\frac{N_{·j}}{n})]^2}{N_{i·}(\frac{N_j}{n})}\sim \chi^2{[(r-1)(s-1)]}① i=1i=rj=1j=sNi(nNj)[XijNi(nNj)]2χ2[(r1)(s1)]

Note:
the left side of above is “discrete”
the right side of above is “continuous”
----------------------------------------------
Let’s review the concepts of “Multi-dimensional Normal Distribution”,
according to[1]

X ∼ N ( μ , ∑ ) X\sim N(\mu,\sum) XN(μ,)
μ = [ E [ X 1 ] , E [ X 2 ] , ⋅ ⋅ ⋅ , E [ X s ] ] T \mu=[E[X_1],E[X_2],···,E[X_s]]^T μ=[E[X1],E[X2],,E[Xs]]T
∑ = : [ C o v [ X i , X j ] ; 1 ≤ i , j ≤ s ] \sum=: [Cov[X_i,X_j];1≤i,j≤s] =:[Cov[Xi,Xj];1i,js]

-----------------------------------------------------------------------------------------------

∑ j = 1 j = s [ X i j − N i ⋅ ( N ⋅ j n ) ] 2 N i ⋅ ( N ⋅ j n ) \sum_{j=1}^{j=s}\frac{[X_{ij}-N_{i·}(\frac{N_{·j}}{n})]^2}{N_{i·}(\frac{N_{·j}}{n})} j=1j=sNi(nNj)[XijNi(nNj)]2

= N i ⋅ ∑ j = 1 j = s [ X i j N i ⋅ − ( N ⋅ j n ) ] 2 ( N ⋅ j n ) N_{i·}\sum_{j=1}^{j=s}\frac{[\frac{X_{ij}}{N_{i·}}-(\frac{N_{·j}}{n})]^2}{(\frac{N_{·j}}{n})} Nij=1j=s(nNj)[NiXij(nNj)]2

= N i ⋅ { [ ∑ j = 1 j = s − 1 [ X i j N i ⋅ − ( N ⋅ j n ) ] 2 N ⋅ j n ] + [ X i s N i ⋅ − ( N ⋅ s n ) ] 2 N ⋅ s n } N_{i·}\{[\sum_{j=1}^{j=s-1}\frac{[\frac{X_{ij}}{N_{i·}}-(\frac{N_{·j}}{n})]^2}{\frac{N_{·j}}{n}}]+ \frac{[\frac{X_{is}}{N_{i·}}-(\frac{N_{·s}}{n})]^2} {\frac{N_{·s}}{n}}\} Ni{[j=1j=s1nNj[NiXij(nNj)]2]+nNs[NiXis(nNs)]2}

= N i ⋅ { [ ∑ j = 1 j = s − 1 [ X i j N i ⋅ − ( N ⋅ j n ) ] 2 N ⋅ j N i ⋅ ] + [ ∑ j = 1 j = s − 1 ( X i j N i ⋅ − N ⋅ j n ) ] 2 N s N i ⋅ } N_{i·} \{[\sum_{j=1}^{j=s-1}\frac{[ \frac{X_{ij}}{N_{i·}}-(\frac{N_{·j}}{n})]^2 }{ \frac{N_{·j}}{N_{i·}} }]+ \frac{[\sum_{j=1}^{j=s-1}(\frac{X_{ij}}{N_{i·}}-\frac{N_{·j}}{n})]^2}{{\frac{Ns}{N_{i·}}}} \} Ni{[j=1j=s1NiNj[NiXij(nNj)]2]+NiNs[j=1j=s1(NiXijnNj)]2}

Let’s set
p ∗ = ( N ⋅ 1 n , . . . , N ⋅ ( s − 1 ) n ) T p^*=(\frac{N_{·1}}{n},...,\frac{N_{·(s-1)}}{n})^T p=(nN1,...,nN(s1))T

X ‾ ∗ = ( X i 1 N i ⋅ , ⋅ ⋅ ⋅ , X i ( s − 1 ) N i ⋅ ) T \overline{X}^*=(\frac{X_{i1}}{N_{i·}},···,\frac{X_{i(s-1)}}{N_{i·}})^T X=(NiXi1,,NiXi(s1))T

So,
N i ⋅ ∑ j = 1 j = s [ X i j N i ⋅ − ( N ⋅ j n ) ] 2 ( N ⋅ j n ) N_{i·}\sum_{j=1}^{j=s}\frac{[\frac{X_{ij}}{N_{i·}}-(\frac{N_{·j}}{n})]^2}{(\frac{N_{·j}}{n})} Nij=1j=s(nNj)[NiXij(nNj)]2

= N i ⋅ ( X ‾ ∗ − p ∗ ) T ( ∑ ∗ ) − 1 ( X ‾ ∗ − p ∗ ) =N_{i·}(\overline{X}^*-p^*)^T(\sum^*)^{-1}(\overline{X}^*-p^*) =Ni(Xp)T()1(Xp)
where ∑ ∗ = \sum^*= =

[ p 1 0 ⋅ ⋅ ⋅ 0 0 p 2 ⋅ ⋅ ⋅ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋅ ⋅ ⋅ p s − 1 ] − [ p 1 p 2 ⋮ p s − 1 ] [ p 1 p 2 ⋮ p s − 1 ] T \left[ \begin{matrix} p_1 & 0 & ···&0 \\ 0 & p_2 & ···&0 \\ \vdots & \vdots & \ddots&\vdots\\ 0&0&···&p_{s-1} \end{matrix} \right]- \left[ \begin{matrix} p_1 \\ p_2 \\ \vdots \\ p_{s-1} \end{matrix} \right] \left[ \begin{matrix} p_1 \\ p_2 \\ \vdots \\ p_{s-1} \end{matrix} \right]^T p1000p2000ps1p1p2ps1p1p2ps1T

According to Sherman-Morison Formula:
( ∑ ∗ ) − 1 = (\sum^*)^{-1}= ()1=

[ 1 p 1 0 ⋅ ⋅ ⋅ 0 0 1 p 2 ⋅ ⋅ ⋅ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋅ ⋅ ⋅ 1 p s − 1 ] − 1 p s [ 1 1 ⋅ ⋅ ⋅ 1 1 1 ⋅ ⋅ ⋅ 1 ⋮ ⋮ ⋱ ⋮ 1 1 ⋅ ⋅ ⋅ 1 ] \left[ \begin{matrix} \frac{1}{p_1} & 0 & ···&0 \\ 0 & \frac{1}{p_2} & ···&0 \\ \vdots & \vdots & \ddots&\vdots\\ 0&0&···&\frac{1}{p_{s-1}} \end{matrix} \right] -\frac{1}{p_s} \left[ \begin{matrix} 1 & 1 & ···&1 \\ 1 & 1 & ···&1 \\ \vdots & \vdots & \ddots&\vdots\\ 1&1&···&1 \end{matrix} \right] p11000p21000ps11ps1111111111

Let’s set Y i = N i ⋅ X ‾ ∗ − p ∗ ∑ ∗ ② Y_i=\sqrt{N_{i·}}\frac{\overline{X}^*-p^*}{\sqrt{\sum^*}}② Yi=Ni Xp
according [3]:
------------------------the following are from wikipedia-------------------------------
[ X 1 ( 1 ) ⋮ X 1 ( k ) ] + [ X 2 ( 1 ) ⋮ X 2 ( k ) ] + ⋯ + [ X n ( 1 ) ⋮ X n ( k ) ] = [ ∑ i = 1 n [ X i ( 1 ) ] ⋮ ∑ i = 1 n [ X i ( k ) ] ] = ∑ i = 1 n X i {\displaystyle {\begin{bmatrix}X_{1(1)}\\\vdots \\X_{1(k)}\end{bmatrix}}+{\begin{bmatrix}X_{2(1)}\\\vdots \\X_{2(k)}\end{bmatrix}}+\cdots +{\begin{bmatrix}X_{n(1)}\\\vdots \\X_{n(k)}\end{bmatrix}} ={\begin{bmatrix}\sum _{i=1}^{n}\left[X_{i(1)}\right]\\\vdots \\\sum _{i=1}^{n}\left[X_{i(k)}\right]\end{bmatrix}}=\sum _{i=1}^{n}\mathbf {X} _{i}} X1(1)X1(k)+X2(1)X2(k)++Xn(1)Xn(k)=i=1n[Xi(1)]i=1n[Xi(k)]=i=1nXi

and the average is

1 n ∑ i = 1 n X i = 1 n [ ∑ i = 1 n X i ( 1 ) ⋮ ∑ i = 1 n X i ( k ) ] = [ X ˉ i ( 1 ) ⋮ X ˉ i ( k ) ] = X ˉ n 1 n ∑ i = 1 n X i = 1 n [ ∑ i = 1 n X i ( 1 ) ⋮ ∑ i = 1 n X i ( k ) ] = [ X ˉ i ( 1 ) ⋮ X ˉ i ( k ) ] = X ˉ n {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}\mathbf {X} _{i}={\frac {1}{n}}{\begin{bmatrix}\sum _{i=1}^{n}X_{i(1)}\\\vdots \\\sum _{i=1}^{n}X_{i(k)}\end{bmatrix}}={\begin{bmatrix}{\bar {X}}_{i(1)}\\\vdots \\{\bar {X}}_{i(k)}\end{bmatrix}}=\mathbf {{\bar {X}}_{n}} } {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}\mathbf {X} _{i}={\frac {1}{n}}{\begin{bmatrix}\sum _{i=1}^{n}X_{i(1)}\\\vdots \\\sum _{i=1}^{n}X_{i(k)}\end{bmatrix}}={\begin{bmatrix}{\bar {X}}_{i(1)}\\\vdots \\{\bar {X}}_{i(k)}\end{bmatrix}}=\mathbf {{\bar {X}}_{n}} } n1i=1nXi=n1i=1nXi(1)i=1nXi(k)=Xˉi(1)Xˉi(k)=Xˉnn1i=1nXi=n1i=1nXi(1)i=1nXi(k)=Xˉi(1)Xˉi(k)=Xˉn
and therefore

1 n ∑ i = 1 n [ X i − E ⁡ ( X i ) ] = 1 n ∑ i = 1 n ( X i − μ ) = n ( X ‾ n − μ ) . 1 n ∑ i = 1 n [ X i − E ⁡ ( X i ) ] = 1 n ∑ i = 1 n ( X i − μ ) = n ( X ‾ n − μ ) . {\displaystyle {\frac {1}{\sqrt {n}}}\sum _{i=1}^{n}\left[\mathbf {X} _{i}-\operatorname {E} \left(X_{i}\right)\right]={\frac {1}{\sqrt {n}}}\sum _{i=1}^{n}(\mathbf {X} _{i}-{\boldsymbol {\mu }})={\sqrt {n}}\left({\overline {\mathbf {X} }}_{n}-{\boldsymbol {\mu }}\right).} {\displaystyle {\frac {1}{\sqrt {n}}}\sum _{i=1}^{n}\left[\mathbf {X} _{i}-\operatorname {E} \left(X_{i}\right)\right]={\frac {1}{\sqrt {n}}}\sum _{i=1}^{n}(\mathbf {X} _{i}-{\boldsymbol {\mu }})={\sqrt {n}}\left({\overline {\mathbf {X} }}_{n}-{\boldsymbol {\mu }}\right).} n 1i=1n[XiE(Xi)]=n 1i=1n(Xiμ)=n (Xnμ).n 1i=1n[XiE(Xi)]=n 1i=1n(Xiμ)=n (Xnμ).
The multivariate central limit theorem states that
n ( X ‾ n − μ )   → D   N k ( 0 , Σ ) {\displaystyle {\sqrt {n}}\left({\overline {\mathbf {X} }}_{n}-{\boldsymbol {\mu }}\right)\ {\stackrel {D}{\rightarrow }}\ N_{k}(0,{\boldsymbol {\Sigma }})} n (Xnμ) D Nk(0,Σ)

------------------------the above are from wikipedia-------------------------------

So,for②,we can get
Y i ∼ N s − 1 ( 0 , I s − 1 ) ③ Y_i\sim N_{s-1}(\bold0,I_{s-1})③ YiNs1(0,Is1)
where
0 = [ 0 , 0 , … , 0 ] T \bold0=[0,0,\dots,0]^T 0=[0,0,,0]T
I s − 1 = E ( s − 1 ) ⋅ ( s − 1 ) I_{s-1}=E_{(s-1)·(s-1)} Is1=E(s1)(s1)
then for ①

∑ i = 1 i = r ∑ j = 1 j = s [ X i j − N i ⋅ ( N ⋅ j n ) ] 2 N i ⋅ ( N ⋅ j n ) = ∑ i = 1 i = r Y i 2 \sum_{i=1}^{i=r} \sum_{j=1}^{j=s}\frac{[X_{ij}-N_{i·}(\frac{N_{·j}}{n})]^2}{N_{i·}(\frac{N_{·j}}{n})}\\ =\sum_{i=1}^{i=r}Y_i^2 i=1i=rj=1j=sNi(nNj)[XijNi(nNj)]2i=1i=rYi2

Because of ③,
∑ i = 1 i = r Y i 2 ∼ χ 2 [ ( s − 1 ) ( r − 1 ) ] \sum_{i=1}^{i=r}Y_i^2\sim\chi^2[(s-1)(r-1)] i=1i=rYi2χ2[(s1)(r1)]

The Chi-Square statistics was invented by Pearson[8].
Reference:
[1]https://en.wikipedia.org/wiki/Multivariate_normal_distribution
[2]《Seven different proofs for the Pearson independence test》
[3]https://en.wikipedia.org/wiki/Central_limit_theorem
[4]https://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2003/lecture-notes/lec23.pdf
[5]https://arxiv.org/pdf/1808.09171.pdf
[6]https://www.math.utah.edu/~davar/ps-pdf-files/Chisquared.pdf
[7]http://personal.psu.edu/drh20/asymp/fall2006/lectures/ANGELchpt07.pdf
[8]https://download.csdn.net/download/appleyuchi/10834144

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值