写这个主题是因为自己需要对多个表型性状进行相关性分析,计算性状之间的相关性系数并可视化。
R语言中cor函数,只能计算相关系数,如果想要计算显著性,需要两两用cor.test进行,如果是多列数据,操作比较麻烦。这里介绍两个包,非常方便的进行多列数据的相关系数及其显著性的检验,并且给出可视化。
1.数据
利用模拟数据进行分析
> set.seed(123)
> dd = as.data.frame(matrix(rnorm(1000),100,10))
> head(dd)
V1 V2 V3 V4 V5 V6 V7
1 -0.56047565 -0.71040656 2.1988103 -0.7152422 -0.07355602 -0.60189285 1.07401226
2 -0.23017749 0.25688371 1.3124130 -0.7526890 -1.16865142 -0.99369859 -0.02734697
3 1.55870831 -0.24669188 -0.2651451 -0.9385387 -0.63474826 1.02678506 -0.03333034
4 0.07050839 -0.34754260 0.5431941 -1.0525133 -0.02884155 0.75106130 -1.51606762
5 0.12928774 -0.95161857 -0.4143399 -0.4371595 0.67069597 -1.50916654 0.79038534
6 1.71506499 -0.04502772 -0.4762469 0.3311792 -1.65054654 -0.09514745 -0.21073418
V8 V9 V10
1 -0.7282191 0.3562833 -1.0141142
2 -1.5404424 -0.6580102 -0.7913139
3 -0.6930946 0.8552022 0.2995937
4 0.1188494 1.1529362 1.6390519
5 -1.3647095 0.2762746 1.0846170
6 0.5899827 0.1441047 -0.6245675
2.计算性状间的相关系数及显著性
首先需要加载Hmisc这个R包,然后利用该包中的rcorr函数,利用as.matrix()函数将dd数据框转换为矩阵
> library("Hmisc")
载入需要的程辑包:lattice
载入需要的程辑包:survival
载入需要的程辑包:Formula
载入需要的程辑包:ggplot2
载入程辑包:‘Hmisc’
The following objects are masked from ‘package:base’:
format.pval, units
Warning messages:
1: 程辑包‘Hmisc’是用R版本4.0.5 来建造的
2: 程辑包‘ggplot2’是用R版本4.0.5 来建造的
>test1 <- rcorr(as.matrix(dd))
> head(test1)
$r
V1 V2 V3 V4 V5 V6
V1 1.00000000 -0.04953215 -0.129176009 -0.04407900 -0.192711098 -0.05648439
V2 -0.04953215 1.00000000 0.030579031 0.04383271 -0.130622187 0.11448792
V3 -0.12917601 0.03057903 1.000000000 -0.04486571 -0.024848379 0.01821008
V4 -0.04407900 0.04383271 -0.044865707 1.00000000 -0.019259855 -0.08991276
V5 -0.19271110 -0.13062219 -0.024848379 -0.01925986 1.000000000 0.20661771
V6 -0.05648439 0.11448792 0.018210080 -0.08991276 0.206617706 1.00000000
V7 -0.03436677 0.07811288 0.008685177 -0.06378733 -0.006502516 -0.06469260
V8 0.18157297 -0.03312350 -0.115029463 0.16789328 -0.140653506 0.09415014
V9 -0.02260251 -0.04532832 -0.053281912 -0.16506751 -0.039583467 0.07436348
V10 0.01100538 -0.09149035 -0.014442042 0.24521164 -0.016043315 -0.02505532
V7 V8 V9 V10
V1 -0.0343667704 0.1815729746 -0.0226025134 0.01100538
V2 0.0781128798 -0.0331234961 -0.0453283184 -0.09149035
V3 0.0086851768 -0.1150294630 -0.0532819116 -0.01444204
V4 -0.0637873268 0.1678932800 -0.1650675124 0.24521164
V5 -0.0065025161 -0.1406535056 -0.0395834667 -0.01604332
V6 -0.0646926020 0.0941501419 0.0743634830 -0.02505532
$n
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 100 100 100 100 100 100 100 100 100 100
V2 100 100 100 100 100 100 100 100 100 100
V3 100 100 100 100 100 100 100 100 100 100
V4 100 100 100 100 100 100 100 100 100 100
V5 100 100 100 100 100 100 100 100 100 100
V6 100 100 100 100 100 100 100 100 100 100
V7 100 100 100 100 100 100 100 100 100 100
V8 100 100 100 100 100 100 100 100 100 100
V9 100 100 100 100 100 100 100 100 100 100
V10 100 100 100 100 100 100 100 100 100 100
$P
V1 V2 V3 V4 V5 V6 V7 V8
V1 NA 0.6245623 0.2002311 0.66322862 0.05474062 0.57671579 0.7342728 0.07061249
V2 0.62456234 NA 0.7626396 0.66499697 0.19519706 0.25670258 0.4398246 0.74354664
V3 0.20023108 0.7626396 NA 0.65759240 0.80614857 0.85728899 0.9316561 0.25444461
V4 0.66322862 0.6649970 0.6575924 NA 0.84915653 0.37366621 0.5283700 0.09497893
V5 0.05474062 0.1951971 0.8061486 0.84915653 NA 0.03915902 0.9488044 0.16277448
V6 0.57671579 0.2567026 0.8572890 0.37366621 0.03915902 NA 0.5225218 0.35147288
V7 0.73427283 0.4398246 0.9316561 0.52837000 0.94880437 0.52252183 NA 0.99790971
V8 0.07061249 0.7435466 0.2544446 0.09497893 0.16277448 0.35147288 0.9979097 NA
V9 0.82337056 0.6542870 0.5985447 0.10075567 0.69578885 0.46215410 0.2011669 0.99360266
V10 0.91346152 0.3653057 0.8865965 0.01393431 0.87412090 0.80456621 0.8398046 0.81071005
V9 V10
V1 0.8233706 0.91346152
V2 0.6542870 0.36530567
V3 0.5985447 0.88659653
V4 0.1007557 0.01393431
V5 0.6957888 0.87412090
V6 0.4621541 0.80456621
V7 0.2011669 0.83980464
V8 0.9936027 0.81071005
V9 NA 0.82246247
V10 0.8224625 NA
3. 可视化
利用函数rcorr函数计算出了性状间的相关系数和显著性,接下来利用PerformanceAnalytics包进行可视化
library("PerformanceAnalytics")
chart.Correlation(dd, histogram=TRUE, pch=19)
上述此图,左下角显示相关性趋势,右上角是相关性的显著性指数