皮尔逊相关系数又称为简单相关系数,英文名称:pearson
correlation
coefficient,它描述了两个定距变量间联系的紧密程度(线性关系)。样本的简单相关系数一般用R表示,计算公式为:
其中N为样本量。R描述的是两个变量间线性相关强弱的程度。R的取值在-1与+1之间,若R>0,表明两个变量是正相关,即一个变量的值越大,另一个变量的值也会越大;若R<0,表明两个变量是负相关,即一个变量的值越大另一个变量的值反而会越小。R的绝对值越大表明相关性越强,要注意的是这里并不存在因果关系。若R=0,表明两个变量间不是线性相关,但有可能是其他方式的相关(比如曲线方式)。
利用样本相关系数推断总体中两个变量是否相关,可以用t统计量对总体相关系数为0的原假设进行检验。若t检验显著,则拒绝原假设,即两个变量是线性相关的;若t检验不显著,则不能拒绝原假设,即两个变量不是线性相关的。
pearson(皮尔逊相关系数)R值和P值都需要考虑,R值表示在样本中变量间的相关系数,表示相关性的大小;P值是检验值,是检验两变量在样本来自的总体中是否存在和样本一样的相关性。
Exercise
At the beginning
of an introductory engineering course, 10 students were given a
pre-test to determine their initial mathematical ability. The
following table lists the student's pre-test score and final grade
in the class:
Student Number
Pre-Test
Course Grade
1
2
3
4
5
6
7
8
9
10
45
23
50
46
33
21
13
30
34
50
92
86
97
95
87
76
72
84
85
98
1. Calculate
Pearson's Correlation Coefficient (r) on this data.
r =
2. What
statistical test is used to determine if this value of r is
statistically significant?
3. Is the
correlation seen in this data statistically significant. Why?
4. Display a
scatterplot of the data. Does the data appear linearly correlated.
Do there seem to be any outlier values?
5. Suppose an 11th
student were added to the data, with a pre-test score of 40 and a
Course Grade of 70. How would this effect r?