一般微生物测序实验会包含三个生物学重复,然后获得有重复的OTU table和环境因子的数据。许多文献在对OTUs进行冗余分析时都包含了重复,包含重复与不包含重复会有何种不同?是否会影响我们对分析结果的解读呢?下面我们以基于距离的冗余分析(db-RDA)作为例子,尝试分析一下,看看两者的区别。
1. 含重复的冗余分析
Load library
library(vegan)
Load test OTU table and evironmental data with replicates
nifhotu_rep <- read.delim("~/Desktop/nifhotu_with_reps.txt", row.names=1)
env_rep <- read.delim("~/Desktop/env_with_reps.txt", row.names=1)
RDA with data keeping replicates
#distance calculation
bray_dist_rep <- vegdist(nifhotu_rep,method = "bray")
#rda analysis
otu_dbrda_rep <- dbrda(sqrt(bray_dist_rep)~.,data=env_rep,add = TRUE)
anova(otu_dbrda_rep,permutations = how(nperm = 999))
## Permutation test for dbrda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: dbrda(formula = sqrt(bray_dist_rep) ~ pH + tn + tp + tk + ep + ek + emo + cec + den + nl + fl + sl + wc, data = env_rep, add = TRUE)
## Df SumOfSqs F Pr(>F)
## Model 12 8.9011 1.9271 0.001 ***
## Residual 63 24.2492
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#forward selection
step_forward_rep <- ordistep(dbrda(sqrt(bray_dist_rep)~1,data=env_rep,add = TRUE),scope = formula(otu_dbrda_rep),direction="forward")
##
## Start: sqrt(bray_dist_rep) ~ 1
##
## Df AIC F Pr(>F)
## + pH 1 264.42 4.6703 0.005 **
## + tp 1 266.44 2.6099 0.005 **
## + tn 1 267.02 2.0293 0.005 **
## + nl 1 267.09 1.9555 0.005 **
## + emo 1 267.23 1.8201 0.005 **
## + fl 1 267.33 1.7211 0.005 **
## + den 1 267.46 1.5851 0.005 **
## + tk 1 267.47 1.5788 0.005 **
## + cec 1 267.51 1.5349 0.005 **
## + wc 1 267.53 1.5162 0.005 **
## + sl 1 267.58 1.4695 0.005 **
## + ep 1 267.65 1.3958 0.025 *
## + ek 1 267.88 1.1758 0.085 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH
##
## Df AIC F Pr(>F)
## + tn 1 264.30 2.0697 0.005 **
## + emo 1 264.64 1.7356 0.005 **
## + den 1 264.68 1.6936 0.005 **
## + tp 1 264.72 1.6565 0.005 **
## + wc 1 264.76 1.6177 0.005 **
## + fl 1 264.76 1.6168 0.005 **
## + tk 1 264.76 1.6140 0.005 **
## + cec 1 264.80 1.5721 0.005 **
## + sl 1 264.82 1.5542 0.005 **
## + nl 1 264.84 1.5320 0.005 **
## + ep 1 264.99 1.3925 0.005 **
## + ek 1 265.18 1.2016 0.020 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn
##
## Df AIC F Pr(>F)
## + emo 1 264.46 1.7612 0.005 **
## + den 1 264.52 1.7025 0.005 **
## + fl 1 264.63 1.5938 0.005 **
## + tk 1 264.64 1.5894 0.005 **
## + sl 1 264.64 1.5886 0.005 **
## + cec 1 264.65 1.5806 0.005 **
## + wc 1 264.65 1.5798 0.005 **
## + tp 1 264.69 1.5388 0.005 **
## + nl 1 264.79 1.4428 0.005 **
## + ep 1 264.81 1.4273 0.005 **
## + ek 1 265.01 1.2264 0.020 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo
##
## Df AIC F Pr(>F)
## + den 1 264.67 1.6899 0.005 **
## + tk 1 264.75 1.6178 0.005 **
## + fl 1 264.80 1.5673 0.005 **
## + cec 1 264.80 1.5666 0.005 **
## + sl 1 264.81 1.5581 0.005 **
## + tp 1 264.81 1.5545 0.005 **
## + wc 1 264.82 1.5467 0.005 **
## + nl 1 264.92 1.4515 0.005 **
## + ep 1 264.93 1.4471 0.005 **
## + ek 1 265.17 1.2181 0.025 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den
##
## Df AIC F Pr(>F)
## + sl 1 264.92 1.6384 0.005 **
## + fl 1 264.92 1.6339 0.005 **
## + tp 1 264.98 1.5733 0.005 **
## + cec 1 265.01 1.5518 0.005 **
## + tk 1 265.01 1.5512 0.005 **
## + nl 1 265.09 1.4754 0.005 **
## + ep 1 265.10 1.4636 0.005 **
## + wc 1 265.11 1.4558 0.005 **
## + ek 1 265.34 1.2361 0.010 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl
##
## Df AIC F Pr(>F)
## + cec 1 265.17 1.6037 0.005 **
## + tp 1 265.19 1.5856 0.005 **
## + tk 1 265.23 1.5514 0.005 **
## + nl 1 265.29 1.4937 0.005 **
## + fl 1 265.29 1.4937 0.005 **
## + ep 1 265.31 1.4732 0.005 **
## + wc 1 265.31 1.4687 0.005 **
## + ek 1 265.56 1.2453 0.015 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec
##
## Df AIC F Pr(>F)
## + tk 1 265.43 1.5708 0.005 **
## + nl 1 265.50 1.5069 0.005 **
## + fl 1 265.50 1.5069 0.005 **
## + wc 1 265.55 1.4680 0.005 **
## + tp 1 265.55 1.4634 0.005 **
## + ep 1 265.56 1.4582 0.005 **
## + ek 1 265.61 1.4074 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec + tk
##
## Df AIC F Pr(>F)
## + nl 1 265.73 1.5160 0.005 **
## + fl 1 265.73 1.5160 0.005 **
## + ep 1 265.78 1.4734 0.005 **
## + wc 1 265.79 1.4672 0.005 **
## + tp 1 265.82 1.4388 0.005 **
## + ek 1 265.84 1.4185 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec + tk + nl
##
## Df AIC F Pr(>F)
## + tp 1 266.09 1.4401 0.005 **
## + ek 1 266.11 1.4222 0.005 **
## + ep 1 266.20 1.3411 0.010 **
## + wc 1 266.28 1.2770 0.020 *
## + fl 0 265.73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec + tk + nl + tp
##
## Df AIC F Pr(>F)
## + ek 1 266.46 1.4102 0.005 **
## + ep 1 266.52 1.3535 0.010 **
## + wc 1 266.60 1.2848 0.010 **
## + fl 0 266.09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec + tk + nl + tp + ek
##
## Df AIC F Pr(>F)
## + ep 1 266.84 1.3825 0.005 **
## + wc 1 266.97 1.2697 0.015 *
## + fl 0 266.46
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec + tk + nl + tp + ek + ep
##
## Df AIC F Pr(>F)
## + wc 1 267.31 1.2775 0.025 *
## + fl 0 266.84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_rep) ~ pH + tn + emo + den + sl + cec + tk + nl + tp + ek + ep + wc
##
## Df AIC F Pr(>F)
## + fl 0 267.31
2. 去除重复的冗余分析
Load test OTU table and evironmental data without replicates
nifhotu_norep <- read.delim("~/Desktop/nifhotu_no_reps.txt", row.names=1)
env_norep <- read.delim("~/Desktop/env_no_reps.txt", row.names=1)
RDA with data without replicates
#distance calculation
bray_dist_norep <- vegdist(nifhotu_norep,method = "bray")
#rda analysis
otu_dbrda_norep <- dbrda(sqrt(bray_dist_norep)~.,data=env_norep,add = TRUE)
anova(otu_dbrda_norep,permutations = how(nperm = 999))
## Permutation test for dbrda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: dbrda(formula = sqrt(bray_dist_norep) ~ pH + tn + tp + tk + ep + ek + emo + cec + den + nl + fl + sl + wc, data = env_norep, add = TRUE)
## Df SumOfSqs F Pr(>F)
## Model 12 5.5331 1.185 0.001 ***
## Residual 15 5.8365
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#forward selection
step_forward_norep <- ordistep(dbrda(sqrt(bray_dist_norep)~1,data=env_norep,add = TRUE),scope = formula(otu_dbrda_norep),direction="forward")
##
## Start: sqrt(bray_dist_norep) ~ 1
##
## Df AIC F Pr(>F)
## + pH 1 68.259 2.7227 0.005 **
## + tp 1 69.368 1.6078 0.005 **
## + nl 1 69.635 1.3461 0.015 *
## + tn 1 69.723 1.2602 0.020 *
## + fl 1 69.810 1.1756 0.075 .
## + emo 1 69.891 1.0965 0.180
## + den 1 69.927 1.0618 0.195
## + tk 1 69.954 1.0362 0.295
## + wc 1 69.954 1.0363 0.300
## + cec 1 69.967 1.0233 0.350
## + sl 1 70.006 0.9860 0.545
## + ep 1 70.137 0.8600 0.955
## + ek 1 70.188 0.8112 1.000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_norep) ~ pH
##
## Df AIC F Pr(>F)
## + tn 1 68.817 1.3213 0.005 **
## + tp 1 68.984 1.1648 0.050 *
## + den 1 69.007 1.1437 0.065 .
## + fl 1 69.045 1.1083 0.115
## + emo 1 69.037 1.1160 0.125
## + tk 1 69.057 1.0968 0.155
## + wc 1 69.059 1.0947 0.185
## + nl 1 69.073 1.0820 0.195
## + cec 1 69.073 1.0824 0.210
## + sl 1 69.105 1.0522 0.265
## + ep 1 69.271 0.8986 0.895
## + ek 1 69.309 0.8633 0.930
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: sqrt(bray_dist_norep) ~ pH + tn
##
## Df AIC F Pr(>F)
## + emo 1 69.529 1.1304 0.085 .
## + tp 1 69.520 1.1382 0.090 .
## + den 1 69.530 1.1296 0.100 .
## + cec 1 69.573 1.0903 0.180
## + wc 1 69.585 1.0799 0.205
## + fl 1 69.582 1.0827 0.220
## + tk 1 69.602 1.0648 0.235
## + sl 1 69.595 1.0708 0.250
## + nl 1 69.693 0.9834 0.585
## + ep 1 69.768 0.9164 0.765
## + ek 1 69.817 0.8734 0.885
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
我们可以看到,如果不去除重复,几乎所有环境因子的影响都是显著的。但去除重复后,只有pH和全氮的影响显著。
测试数据参考如下资源:
生物学重复对RDA分析的影响
更多R语言分析微生物生态学的资源可参考如下链接:
https://mianbaoduo.com/o/bread/mbd-YpmTlZpr