Cramér’s V的公式、例子、SPSS

Cramér’s V – What and Why?

https://www.spss-tutorials.com/cramers-v-what-and-why/#ref20

Cramér’s V is a number between 0 and 1 that indicates how strongly two categorical variables are associated. Cramér’s V 是一个介于 0 和 1 之间的数字,表示两个分类变量的关联程度。

If we’d like to know if 2 categorical variables are associated, our first option is the chi-square independence test. A p-value close to zero means that our variables are very unlikely to be completely unassociated in some population. However, this does not mean the variables are strongly associated; a weak association in a large sample size may also result in p = 0.000.

如果我们想知道 2 个分类变量是否相关,我们的第一个选项是卡方独立性检验。接近于零的 p 值意味着我们的变量在某些人群中不太可能完全不相关。然而,这并不意味着这些变量是强相关的。大样本量中的弱关联也可能导致 p = 0.000。

Cramér’s V - Formula

A measure that does indicate the strength of the association is Cramér’s V, defined as

表明关联强度的度量是Cramér’s V,定义为

ϕ c = χ 2 N ( k − 1 ) \phi_c = \sqrt{\frac{\chi^2}{N(k - 1)}} ϕc=N(k1)χ2

where

  • ( ϕ c \phi_c ϕc) denotes Cramér’s V; ( ϕ \phi ϕ) is the Greek letter “phi” and refers to the “phi coefficient”, a special case of Cramér’s V which we’ll discuss later.

  • ( ϕ c \phi_c ϕc) 表示 Cramér’s V; ( ϕ \phi ϕ) 是希腊字母“phi”,指的是“phi 系数”,Cramér’s V 的一个特例,我们将在后面讨论。

  • ( χ 2 \chi^2 χ2) is the Pearson chi-square statistic from the aforementioned test;

  • ( χ 2 \chi^2 χ2) 是上述检验的 Pearson 卡方统计量;

  • ( N N N) is the sample size involved in the test and

  • ( N N N) 是测试涉及的样本量,

  • ( k k k) is the lesser number of categories of either variable.

  • ( k k k) 是任一变量的较少类别数。

Cramér’s V - Examples

A scientist wants to know if music preference is related to study major. He asks 200 students, resulting in the contingency table shown below.

一位科学家想知道音乐偏好是否与学习专业有关。他询问了 200 名学生,得出的列联表如下所示。

Cramers V Crosstab Counts

These raw frequencies are just what we need for all sort of computations but they don’t show much of a pattern. The association -if any- between the variables is easier to see if we inspect row percentages instead of raw frequencies. Things become even clearer if we visualize our percentages in stacked bar charts.

这些原始频率正是我们进行各种计算所需要的,但它们并没有显示出太多的模式。如果我们检查行百分比而不是原始频率,则变量之间的关联(如果有)更容易看出。如果我们在堆积条形图中可视化我们的百分比,事情就会变得更加清晰。

Cramér’s V - Independence

In our first example, the variables are perfectly independent: ( χ 2 \chi^2 χ2) = 0. According to our formula, chi-square = 0 implies that Cramér’s V = 0. This means that music preference “does not say anything” about study major. The associated table and chart make this clear.

在我们的第一个例子中,变量是完全独立的:( χ 2 \chi^2 χ2) = 0。根据我们的公式,卡方 = 0 意味着 Cramér’s V = 0。这意味着音乐偏好与学习专业毫无关联。相关的表格和图表清楚地说明了这一点。

Cramers V Crosstab Unassociated PercentagesCramers V Unassociated Variables Chart

Note that the frequency distribution of study major is identical in each music preference group. If we’d like to predict somebody’s study major, knowing his music preference does not help us the least little bit. Our best guess is always law or “other”.

请注意,每个音乐偏好组中学习专业的频率分布是相同的。如果我们想预测某人的学习专业,了解他的音乐偏好对我们没有丝毫帮助。我们最好的猜测总是法律或“其他”。

Cramér’s V - Moderate Association

A second sample of 200 students show a different pattern. The row percentages are shown below.

200 名学生的第二个样本显示出不同的模式。行百分比如下所示。

Cramers V Crosstab Medium Association
This table shows quite some association between music preference and study major: the frequency distributions of studies are different for music preference groups. For instance, 60% of all students who prefer pop music study psychology. Those who prefer classical music mostly study law. The chart below visualizes our table.

该表显示了音乐偏好和学习专业之间的一些关联:音乐偏好组的学习频率分布不同。例如,所有喜欢流行音乐的学生中有 60% 学习心理学。喜欢古典音乐的人大多学习法律。下面的图表可视化了我们的表格。

Cramers V Medium Association Chart
Note that music preference says quite a bit about study major: knowing the former helps a lot in predicting the latter. For these data

请注意,音乐偏好在很大程度上说明了学习专业:了解前者对预测后者有很大帮助。对于这些数据

  • ( χ 2 ≈ \chi^2 \approx χ2) 113; For calculating this chi-square value, see either Chi-Square Independence Test - Quick Introduction or SPSS Chi-Square Independence Test.

  • 要计算此卡方值,请参阅卡方独立性检验 - 快速介绍或SPSS卡方独立性检验。

  • our sample size N = 200 and

  • 我们的样本大小 N = 200

  • we’ve variables with 4 and 5 categories so k = (4 -1) = 3.

  • 我们有 4 个和 5 个类别的变量,所以 k = (4 -1) = 3。

It follows that

ϕ c = 113 200 ( 3 ) = 0.43. \phi_c = \sqrt{\frac{113}{200(3)}} = 0.43. ϕc=200(3)113 =0.43.

which is substantial but not super high since Cramér’s V has a maximum value of 1.

这是可观的但不是超高,因为Cramér’s V的最大值为 1。

Cramér’s V - Perfect Association

In a third -and last- sample of students, music preference and study major are perfectly associated. The table and chart below show the row percentages.

在第三个也是最后一个学生样本中,音乐偏好和学习专业完全相关。下面的表格和图表显示了行百分比。
Cramers V Crosstab Perfect Association
Cramers V Perfect Association Chart

If we know a student’s music preference, we know his study major with certainty. This implies that our variables are perfectly associated. Do notice, however, that it doesn’t work the other way around: we can’t tell with certainty someone’s music preference from his study major but this is not necessary for perfect association: ( χ 2 \chi^2 χ2) = 600 so

如果我们知道学生的音乐偏好,我们就可以肯定地知道他的学习专业。这意味着我们的变量是完全关联的。但是,请注意,反之则不然:我们无法确定某人的学习专业对音乐的偏好,但这对于完美关联不是必需的:( χ 2 \chi^2 χ2) = 600 所以

ϕ c = 600 200 ( 3 ) = 1 , \phi_c = \sqrt{\frac{600}{200(3)}} = 1, ϕc=200(3)600 =1,

which is the very highest possible value for Cramér’s V.

这是Cramér’s V的最高可能值。

Alternative Measures

  • An alternative association measure for two nominal variables is the contingency coefficient. However, it’s better avoided since its maximum value depends on the dimensions of the contingency table involved.

  • 两个nominal variables的替代关联度量是列联系数。但是,最好避免使用它,因为它的最大值取决于所涉及的列联表的维度。

  • For two ordinal variables, a Spearman correlation or Kendall’s tau are preferable over Cramér’s V.

  • 对于两个有序变量,Spearman 相关或 Kendall tau 比 Cramér’s V 更可取。

  • For two metric variables, a Pearson correlation is the preferred measure.

  • 对于两个度量变量,Pearson 相关性是首选度量。

  • If both variables are dichotomous (resulting in a 2 by 2 table) use a phi coefficient, which is simply a Pearson correlation computed on dichotomous variables.

  • 如果两个变量都是二分的(产生一个 2 x 2 的表),请使用 phi 系数,这只是对二分变量计算的 Pearson 相关性。

Cramér’s V - SPSS

In SPSS, Cramér’s V is available from Analyze -->Descriptive Statistics -->Crosstabs. Next, fill out the dialog as shown below.

在 SPSS 中,Cramér’s V 可从分析描述性统计交叉表中获得。接下来,填写对话框,如下所示。

Cramers V from SPSS Crosstabs

Warning: for tables larger than 2 by 2, SPSS returns nonsensical values for phi without throwing any warning or error. These are often > 1, which isn’t even possible for Pearson correlations. Oddly, you can’t request Cramér’s V without getting these crazy phi values.

警告:对于大于 2 × 2 的表,SPSS 会返回无意义的 phi 值,而不会抛出任何警告或错误。这些通常 > 1,这对于 Pearson 相关性甚至是不可能的。奇怪的是,如果没有这些疯狂的 phi 值,您就无法请求Cramér’s V。

Final Notes

Cramér’s V is also known as Cramér’s phi (coefficient). It is an extension of the aforementioned phi coefficient for tables larger than 2 by 2, hence its notation as ( ϕ c \phi_c ϕc). It’s been suggested that its been replaced by “V” because old computers couldn’t print the letter ( ϕ \phi ϕ).

Cramér’s V也称为Cramér’s phi (coefficient)。它是上述 phi 系数对于大于 2 x 2 的表的扩展,因此其符号为 ( ϕ c \phi_c ϕc)。有人建议将其替换为“V”,因为旧计算机无法打印字母 ( ϕ \phi ϕ)。

Thank you for reading.

  • 7
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值