关于χ²分布和统计

Recently I was helping my wife review some research papers in her physiotherapy area, some of which involve certain amount of statistical analysis, particularly χ² statistic which I later found is not quite a trivial statistical topic. I realized I have forgotten the majority of what I learned about probability and statistics in the university. Now I have to pick some of them up again. Fortunately, Wikipedia is always very handy for such needs.

First of all, what is χ² distribution? (As it is what χ² is based on)

In short, k-order χ² distribution or χ² distribution with k (k is a positive integer) degrees of freedom is the sum of squares of k independent standard normal random variables (random variables with thestandard normal (Gaussian) distribution). It degenerates to a standard normal random variable when k is 1.

Its probability density function and cumulative density function are both given in the Wikipedia article about it.

However what is interesting is its utilization as a mathematical tool in statistic tests.

Think about the following scenario.

From hypothesis, in a certain area, the ratio of the number of men to that of women is 1.1:1, and we can use the tool developed using χ² statistic to test how likely this 'theory' or statement is NOT true.

To answer this question, the key is to create a formula in a similar form to χ² where the errors or differences are corresponding to the individual random variables in χ².

And at the same time we can draw a sample of people from that area in the number that can easily test the frequency, like 105, as the ideal match of the theoretical frequency would be 55 men and 50 women.

And the formula mentioned above is defined as (note this testing variable is also called χ², as it's a χ² test)

χ² = (Number of Men from the Sample - 55)^2 / 55 + (Number of Women from the Sample - 50)^2 / 50, provided the size of the sample is 105.

We can see either of the two components of the sum above should act like the square of a standard normal random variable if the statement is true, however they are completely correlated instead of independent as if one of them is known the other is determined.

So if we end up having 59 men and 46 women in the sample, we will have χ² = 0.61. Look up in the cdf of χ² for degrees of freedom being 1, we find the possibility of χ² over 0.61 is around 0.4 which is way above the conventional criteria for statistical significance 0.001. This possibility might be denoted by p in some literature. So normally we would not reject the null hypothesis.

Hmm, the above interpretation sounds not making much sense (esp. the fact that we treat the case with degrees of freedom being 1 whereas there are actually two terms involved), however that's what I understand from the Wikipedia articles. Will review and correct that after a further study on the subject.

References:

1. Chi-squared distribution, Wikipedia

2. Pearson's chi-squared test

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值