Applications of Chi-Square Tests

29 篇文章 7 订阅
November 3, 2015
By  arthur charpentier

(This article was first published on  Freakonometrics » R-english, and kindly contributed to  R-bloggers)

This morning, in our mathematical statistical class, we’ve seen the use of the chi-square test. The first one was related to some goodness of fit of a multinomial distribution. Assume that . In order to test  against , use the statistic

Under . For instance, we have the number of weddings, in a large city, per season,

> n=c(301,356,413,262)

We want to test if weddings are celebrated uniformely over the year, i.e. .

> np=rep(sum(n)/4,4)
> cbind(n,np)
       n  np
[1,] 301 333
[2,] 356 333
[3,] 413 333
[4,] 262 333
> Q=sum( (n-np)^2/np  )
> Q
[1] 39.02102

This quantity should be compared with the quantile of the chi-square distribution

> qchisq(.95,df=4-1)
[1] 7.814728

but it is also possible to compute the p-value,

> 1-pchisq(Q,df=4-1)
[1] 1.717959e-08

Here, we reject the assumption that weddings are celebrated uniformly over the year.

A second application is a goodness-of-fit test, for some parametric distribution. Assume that  takes discrete data, say . Here . In order to test  against , use

One can prove that  under . For instance, consider the popular example of von Bortkiewicz’s horsekicks data.

> n=c(109,65,22,3,1)
> sum(n*0:4)/sum(n) 
[1] 0.61

The  first thing is that we should regroup 3 and 4+, in order to have enough observation in each cell of the table

> n_correc=c(109,65,22,4)

Now we can try a  distribution

> np=200*c(dpois(0:2,lambda=.6),
+    1-ppois(2,lambda=.6))
> n_correc=c(109,65,22,4)
> cbind(n_correc,np)
     n_correc         np
[1,]      109 109.762327
[2,]       65  65.857396
[3,]       22  19.757219
[4,]        4   4.623058
> Q=sum( (n_correc-np)^2/np  )
> Q
[1] 0.3550214

The quantile of the chi-square distribution is

> qchisq(.95,df=4-1-1)
[1] 5.991465

and the p-value is

> 1-pchisq(Q,df=4-1-1)
[1] 0.837352

Finally, it is possible to use the chi-square test in order to test for independence. Consider here two categorical variablesand , e.g. the color of the hair, and the color of the eyes, and summarize the information in a contingency table

> n=HairEyeColor[,,1]+HairEyeColor[,,2]
> n
       Eye
Hair    Brown Blue Hazel Green
  Black    68   20    15     5
  Brown   119   84    54    29
  Red      26   17    14    14
  Blond     7   94    10    16

In that case, use

with

where  and  denote respectively the number of observation per row and per column.

> ni=apply(n,1,sum)         # sum per row [hair]
> nj=apply(n,2,sum)         # sum per colum [eye]
> n_ind= ni %*% t(nj)/sum(n)
> rownames(n_ind)=rownames(n)
> n_ind
          Brown      Blue    Hazel     Green
Black  40.13514  39.22297 16.96622 11.675676
Brown 106.28378 103.86824 44.92905 30.918919
Red    26.38514  25.78547 11.15372  7.675676
Blond  47.19595  46.12331 19.95101 13.729730

Under ,

> Q= sum( (n-n_ind)^2/n_ind )
> Q
[1] 138.2898

The quantile is here

> qchisq(.95,df=(4-1)*(4-1))
[1] 16.91898

and the p-value is way below 5%,

> 1-pchisq(Q,df=(4-1)*(4-1))
[1] 0
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值