GMSB文章八:微生物中介分析

欢迎大家关注全网生信学习者系列:

  • WX公zhong号:生信学习者
  • Xiao hong书:生信学习者
  • 知hu:生信学习者
  • CDSN:生信学习者2

介绍

中介分析(Mediation Analysis)是一种统计方法,用于研究一个自变量(通常是独立变量或预测变量)如何通过一个或多个中介变量(也称为中介因素或中介机制)影响因变量(通常是响应变量或结果变量)。中介分析的目的是揭示变量之间的内在关系,特别是自变量对因变量的间接效应,以及这种效应是如何通过中介变量传递的。

评估识别出的与结局变量显著相关的标记物如炎症细胞因子cytokines肠道微生物gut microbiota短链脂肪酸SCFA是否能够在伴侣数目number of partnersHIV-1血清转化HIV-1 seroconversion之间起到中介作用。

自然效应模型(Natural Effect Model)是一种统计方法,用于估计在自然情况下(即在没有干预或随机分配的情况下)变量之间的因果关系。在流行病学和临床研究中,这种模型特别有用,因为它可以帮助研究者了解不同因素对健康结果的自然影响。以下是中介分析的变量解析:

  • exposure variables (自变量X): consisting of sexual exposure groups

  • mediators (中介变量M): biomarkers (cytokines, gut microbiota, SCFA)

  • outcome variable (因变量Y): HIV-1 seroconversion status

加载R包

library(readr)
library(openxlsx)
library(tidyverse) 
library(microbiome)
library(mia)
library(compositions)
library(medflex)
library(ggsci)
library(ggpubr)

导入数据

大家通过以下链接下载数据:

  • 百度网盘链接:https://pan.baidu.com/s/1fz5tWy4mpJ7nd6C260abtg
  • 提取码: 请关注WX公zhong号_生信学习者_后台发送 复现gmsb 获取提取码
df_v1 <- read_csv("./data/GMSB-data/df_v1.csv", show_col_types = FALSE)

bias_corr_species <- read_csv("./data/GMSB-data/results/outputs/bias_corr_species.csv")

sig_species_raw1 <- read.xlsx("./data/GMSB-data/results/outputs/res_ancombc2.xlsx", sheet = 1) 
sig_species_raw2 <- read.xlsx("./data/GMSB-data/results/outputs/res_ancombc2.xlsx", sheet = 2) 

# 趋势分析结果
ne_trend_test <- readRDS("./data/GMSB-data/rds/ne_trend_test.rds")

数据预处理

  • 提取差异物种丰度表

  • 合并分组变量和差异物种丰度表

df_v1 <- df_v1 %>%
  dplyr::filter(
         group1 != "missing",
         druguse != "missing")

# Microbiome data
bias_corr_species <- bias_corr_species %>%
  dplyr::rowwise() %>%
  dplyr::filter(grepl("Species:", species)|grepl("Genus:", species)) %>%
  dplyr::mutate(species = ifelse(grepl("Genus:", species), 
                        paste(strsplit(species, ":")[[1]][2], "spp."),
                        strsplit(species, ":")[[1]][2])) %>%
  dplyr::ungroup() 

# Significant taxa by group
sig_species1 <- sig_species_raw1 %>%
  dplyr::filter(p_val < 0.05) %>%
  .$taxon

# Significant taxa by status
sig_species2 <- sig_species_raw2 %>%
  dplyr::filter(p_statussc < 0.05) %>%
  .$taxon

sig_species <- sort(base::intersect(sig_species1, sig_species2))

# Subset significant taxa
df_da_species <- bias_corr_species %>%
  dplyr::filter(species %in% sig_species)
df_da_species <- t(df_da_species)
colnames(df_da_species) <- df_da_species[1, ]
df_da_species <- data.frame(df_da_species[-1, , drop = FALSE], check.names = FALSE) %>%
  rownames_to_column("sampleid") %>%
  dplyr::mutate(across(-1, as.numeric))

# Exposure, outcome, confounders, and potential mediators
# cytokines overlap: sCD14 and sCD163
# SCFA overlap: none
df_causal <- df_v1 %>%
  dplyr::select(sampleid, recept_anal, group1, status, druguse, cd14, cd163) %>%
  dplyr::left_join(df_da_species, by = "sampleid")
df_causal$status <- factor(df_causal$status)
df_causal$group1 <- factor(df_causal$group1)
df_causal$druguse <- factor(df_causal$druguse)
df_causal <- data.frame(df_causal)

head(df_causal)
sampleidrecept_analgroup1statusdrugusecd14cd163Dehalobacterium.spp.Bacteroides.spp.
1F-15g3ncyes1681.160665.6528NA0.1151280
2F-26g4ncyes1178.440336.1164-0.1870557-1.0903494
3F-33g3ncyes1717.935495.9060NA-0.3093994
4F-47g4ncyes1271.675536.5375NA-3.3221487
5F-54g3ncyes929.645472.6636-1.1128170-1.2803083
6F-64g3ncno1103.670382.0072NA1.8015313

函数

  • constrain_est:提取约束线性模型的beta值

  • l_infty:计算l_infty norm值

  • trend_test:趋势检验

# Estimate coefficients under constraints
constrain_est <- function(
  beta_hat, 
  vcov_hat, 
  contrast, 
  solver = "ECOS") {
  
  beta_opt <- CVXR::Variable(rows = length(beta_hat), cols = 1, name = "beta")
  obj <- CVXR::Minimize(CVXR::matrix_frac(beta_opt - beta_hat, vcov_hat))
  cons <- suppressMessages(contrast %*% beta_opt >= 0)
  problem <- CVXR::Problem(objective = obj, constraints = list(cons))

  suppressMessages(result <- try(CVXR::solve(problem, solver = solver), silent = TRUE))

  if (inherits(result, "try-error")) {
    beta_opt <- rep(0, length(beta_hat))
  } else {
    beta_opt <- as.numeric(result$getValue(beta_opt))
  }
  
  return(beta_opt)
}

# Compute the l_infty norm for a pattern
l_infty <- function(
  beta_opt, 
  node) {
  
  l <- max(abs(beta_opt[node]),
           abs(beta_opt[node] - beta_opt[length(beta_opt)]),
           na.rm = TRUE)
  
  return(l)
}

# Trend test
trend_test <- function(
  beta_hat, 
  vcov_hat, 
  contrast, 
  solver = "ECOS",
  node, 
  B = 1000) {
  
  beta_opt <- constrain_est(
    beta_hat = beta_hat,
    vcov_hat = vcov_hat,
    contrast = contrast,
    solver = solver)
  
  l_opt <- l_infty(beta = beta_opt, node = node)
  beta_null <- MASS::mvrnorm(
    n = B, 
    mu = rep(0, length(beta_hat)), 
    Sigma = vcov_hat)
  
  l_null <- apply(beta_null, 1, function(x) {
    beta_trend <- constrain_est(beta_hat = x, 
                               vcov_hat = vcov_hat, 
                               contrast = contrast,
                               solver = solver)
    l_trend <- l_infty(beta = beta_trend, node = node)
  })
  
  p_trend <- 1/B * sum(l_null > l_opt)
  
  res <- list(estimate = beta_opt,
              test_statistic = l_opt,
              p_value = p_trend)
  
  return(res)
}

contrast_mat <- matrix(
  c(1, 0, 0, 
    -1, 1, 0,
    0, -1, 1),
  nrow = 3, byrow = TRUE)

Cytokine mediators

炎症细胞因子作为中介变量

All cytokines

  • 多重插补medflex::neImpute()

    • 使用medflex::neImpute()函数进行多重插补。
    • status是响应变量,而group1, cd14, cd163, druguse是预测变量。
    • family = binomial(“logit”)指定了使用逻辑斯蒂回归模型,适用于二分类结果的变量。
    • nMed = 2表示生成两个不同的插补数据集。
    • group1生成了group10group11两个插补数据集。
    • 这个步骤是为了处理数据中的缺失值,通过生成多个完整的数据集来模拟缺失数据的可能值。
  • 构建自然效应模型Natural effects model

    • 使用medflex::neModel()函数来拟合自然效应模型。
    • 公式status ~ group10 + group11 + druguse定义了模型,其中status是响应变量,group10, group11, druguse是预测变量。group10group11来自medflex::neImpute()函数的多重插补。
    • family = binomial(“logit”)再次指定了模型的分布族和链接函数。expData = df_exp指定了使用经过多重插补的数据集来进行模型拟合。
    • se = "robust"表示使用稳健的标准误差估计,这有助于在模型估计中减少异方差性的影响。
  • 中介分析使用Natural effects model

    • exposure variables (自变量X): consisting of sexual exposure groups
    • mediators (中介变量M): cytokines
    • outcome variable (因变量Y): HIV-1 seroconversion status
    • adjusted variable (混淆变量): druguse
    • NDE: natural direct effect(自变量X不经过中介变量M直接对因变量Y的效应大小)
    • NIE: natural indirect effect(自变量X仅通过中介变量M间接对因变量Y的效应大小)
df <- df_causal %>%
  dplyr::select(status, group1, cd14, cd163, druguse) %>%
  drop_na()

df_exp <- medflex::neImpute(
  status ~ group1 + cd14 + cd163 + druguse,
  family = binomial("logit"), 
  nMed = 2, 
  data = df)

ne_mod <- medflex::neModel(
  status ~ group10 + group11 + druguse,
  family = binomial("logit"), 
  expData = df_exp, 
  se = "robust")

summ <- summary(ne_mod)
df_summ <- data.frame(summ$coefficients)

# Trend test
# set.seed(123)
# trend_nde <- trend_test(
#   beta_hat = summ$coefficients[2:4, "Estimate"],
#   vcov_hat = ne_mod$vcov[2:4, 2:4],
#   contrast = contrast_mat,
#   node = 3, B = 1000)
# 
# set.seed(123)
# trend_nie <- trend_test(
#   beta_hat = summ$coefficients[5:7, "Estimate"],
#   vcov_hat = ne_mod$vcov[5:7, 5:7],
#   contrast = contrast_mat,
#   node = 3, B = 1000)
# 
# ne_trend_test <- base::append(
#   ne_trend_test,
#   list(cyto_nde = trend_nde, cyto_nie = trend_nie))

# Outputs
types <- c("nde", "nie")
groups <- c("g2", "g3", "g4")
res <- data.frame(
  type = rep(types, each = length(groups)),
  group = rep(groups, length(types)), 
  estimate = NA, se = NA, p = NA,
  trend_p = NA)

res$estimate <- round(df_summ$Estimate[2:7], 2)
res$se <- round(df_summ$Std..Error[2:7], 2)
res$p <- round(df_summ$Pr...z..[2:7], 3)
res$trend_p[3] <- round(ne_trend_test$cyto_nde$p_value, 3)
res$trend_p[6] <- round(ne_trend_test$cyto_nie$p_value, 3)

head(res)
typegroupestimateseptrend_p
1ndeg21.920.640.003NA
2ndeg32.560.610.000NA
3ndeg43.550.700.0000.000
4nieg20.110.100.284NA
5nieg30.120.090.168NA
6nieg40.330.140.0230.007

结果炎症细胞因子cytokines在不同分组的直接和间接效应的结果

  • nde直接效应:具体来说,对于没有druguse的受试者,增加从第1组到另一组的暴露【同时保持sCD14和sCD163在同一水平】显着增加了血清转化的几率。第2、3、4组的优势比为exp(1.92) = 6.82; Exp (2.56) = 12.94, Exp (3.55) = 34.81;

  • nie间接效应,对于未druguse的受试者,将sCD14和sCD163的水平从第1组观察到的水平转移到第4组可能看到的水平,同时在任何给定组保持暴露不变,增加血清转化的几率,比值比为exp(0.33) = 1.39。炎症因子水平从g1组水平转变成g4组,则对应HIV-1血清风险增加。

Individual cytokines

单个细胞因子的中介分析

features <- c("cd14", "cd163")
groups <- c("g2", "g3", "g4")
res_nde <- data.frame(type = "nde",
                     feature = rep(features, each = length(groups)), 
                     group = rep(groups, length(features)), 
                     estimate = NA, se = NA, p = NA)
res_nie <- data.frame(type = "nie",
                     feature = rep(features, each = length(groups)), 
                     group = rep(groups, length(features)), 
                     estimate = NA, se = NA, p = NA)

for (i in seq_along(features)) {
  df <- df_causal %>%
    dplyr::select(status, group1, druguse, all_of(features[i])) %>%
    drop_na()

  t_formula <- as.formula(paste0("status ~ group1 + ", features[i], " + druguse"))
  df_exp <- neImpute(t_formula, family = binomial("logit"), data = df)
  ne_mod <- neModel(status ~ group10 + group11 + druguse,
                   family = binomial("logit"), expData = df_exp, se = "robust")
  summ <- summary(ne_mod)
  
  idx <- seq_along(groups) + (i - 1) * length(groups)
  res_nde[idx, "estimate"] <- round(summ$coefficients[2:4, "Estimate"], 2)
  res_nde[idx, "se"] <- round(summ$coefficients[2:4, "Std. Error"], 2)
  res_nde[idx, "p"] <- round(summ$coefficients[2:4, "Pr(>|z|)"], 3)
  
  res_nie[idx, "estimate"] <- round(summ$coefficients[5:7, "Estimate"], 2)
  res_nie[idx, "se"] <- round(summ$coefficients[5:7, "Std. Error"], 2)
  res_nie[idx, "p"] <- round(summ$coefficients[5:7, "Pr(>|z|)"], 3)
}

res <- rbind(res_nde, res_nie)

head(res)
typefeaturegroupestimatesep
1ndecd14g21.990.640.002
2ndecd14g32.590.610.000
3ndecd14g43.660.690.000
4ndecd163g22.010.650.002
5ndecd163g32.690.620.000
6ndecd163g43.760.710.000

Microbial species

炎症细胞因子作为中介变量

All species

  • 中介分析使用Natural effects model

    • exposure variables (自变量X): consisting of sexual exposure groups
  • mediators (中介变量M): gut microbiota
    • outcome variable (因变量Y): HIV-1 seroconversion status
  • adjusted variable (混淆变量): druguse
# Natural effects model
all_species <- colnames(df_causal)[8:16]
df <- df_causal %>%
  dplyr::select(status, group1, druguse, all_of(all_species))
df[is.na(df)] <- 0

t_formula <- as.formula(paste0("status ~ group1 + ", 
                              paste0(all_species, collapse = " + "), 
                              " + druguse"))
df_exp <- neImpute(
  t_formula,
  family = binomial("logit"), 
  nMed = length(all_species), 
  data = df)

ne_mod <- neModel(
  status ~ group10 + group11 + druguse,
  family = binomial("logit"), 
  expData = df_exp, 
  se = "robust")

summ <- summary(ne_mod)
df_summ <- data.frame(summ$coefficients)

# Trend test
# set.seed(123)
# trend_nde <- trend_test(beta_hat = summ$coefficients[2:4, "Estimate"],
#                        vcov_hat = ne_mod$vcov[2:4, 2:4],
#                        contrast = contrast_mat,
#                        node = 3, B = 1000)
# set.seed(123)
# trend_nie <- trend_test(beta_hat = summ$coefficients[5:7, "Estimate"],
#                        vcov_hat = ne_mod$vcov[5:7, 5:7],
#                        contrast = contrast_mat,
#                        node = 3, B = 1000)
# 
# ne_trend_test <- base::append(ne_trend_test,
#                              list(species_nde = trend_nde, species_nie = trend_nie))

# Outputs
type <- c("nde", "nie")
groups <- c("g2", "g3", "g4")
res <- data.frame(type = rep(types, each = length(groups)), 
                 group = rep(groups, length(type)), 
                 estimate = NA, se = NA, p = NA,
                 trend_p = NA)
res$estimate <- round(df_summ$Estimate[2:7], 2)
res$se <- round(df_summ$Std..Error[2:7], 2)
res$p <- round(df_summ$Pr...z..[2:7], 3)
res$trend_p[3] <- round(ne_trend_test$species_nde$p_value, 3)
res$trend_p[6] <- round(ne_trend_test$species_nie$p_value, 3)

head(res)
typegroupestimateseptrend_p
1ndeg22.080.680.002NA
2ndeg32.610.640.000NA
3ndeg43.580.710.0000.000
4nieg20.020.130.879NA
5nieg30.150.130.264NA
6nieg40.350.170.0450.033

Individual species

features <- sort(all_species)
groups <- c("g2", "g3", "g4")
res_nde <- data.frame(type = "nde",
                     feature = rep(features, each = length(groups)), 
                     group = rep(groups, length(features)), 
                     estimate = NA, se = NA, p = NA)
res_nie <- data.frame(type = "nie",
                     feature = rep(features, each = length(groups)), 
                     group = rep(groups, length(features)), 
                     estimate = NA, se = NA, p = NA)

for (i in seq_along(features)) {
  df <- df_causal %>%
    dplyr::select(status, group1, druguse, all_of(features[i])) %>%
    drop_na()

  t_formula <- as.formula(paste0("status ~ group1 + ", features[i], " + druguse"))
  df_exp <- neImpute(t_formula, family = binomial("logit"), data = df)
  ne_mod <- neModel(status ~ group10 + group11 + druguse,
                   family = binomial("logit"), expData = df_exp, se = "robust")
  summ <- summary(ne_mod)
  
  idx = seq_along(groups) + (i - 1) * length(groups)
  res_nde[idx, "estimate"] <- round(summ$coefficients[2:4, "Estimate"], 2)
  res_nde[idx, "se"] <- round(summ$coefficients[2:4, "Std. Error"], 2)
  res_nde[idx, "p"] <- round(summ$coefficients[2:4, "Pr(>|z|)"], 3)
  
  res_nie[idx, "estimate"] <- round(summ$coefficients[5:7, "Estimate"], 2)
  res_nie[idx, "se"] <- round(summ$coefficients[5:7, "Std. Error"], 2)
  res_nie[idx, "p"] <- round(summ$coefficients[5:7, "Pr(>|z|)"], 3)
}

res <- rbind(res_nde, res_nie)

head(res)
typefeaturegroupestimatesep
1ndeA.muciniphilag2-0.621.370.649
2ndeA.muciniphilag31.570.880.076
3ndeA.muciniphilag43.461.370.012
4ndeB.caccaeg20.901.030.382
5ndeB.caccaeg32.090.950.028
6ndeB.caccaeg43.281.200.006

Combine cytokines and DA species

  • 中介分析使用Natural effects model

    • exposure variables (自变量X): consisting of sexual exposure groups
  • mediators (中介变量M): cytokines & gut microbiota
    • outcome variable (因变量Y): HIV-1 seroconversion status
  • adjusted variable (混淆变量): druguse
# Natural effects model
all_mediators <- c(all_species, "cd14", "cd163")
df <- df_causal %>%
  dplyr::select(status, group1, druguse, all_of(all_mediators))
df[is.na(df)] <- 0

t_formula <- as.formula(paste0("status ~ group1 + ", 
                              paste0(all_mediators, collapse = " + "), 
                              " + druguse"))
df_exp <- neImpute(
  t_formula,
  family = binomial("logit"), 
  nMed = length(all_mediators), 
  data = df)

ne_mod <- neModel(
  status ~ group10 + group11 + druguse,
  family = binomial("logit"), 
  expData = df_exp, 
  se = "robust")

summ <- summary(ne_mod)
df_summ <- data.frame(summ$coefficients)

# Trend test
# set.seed(123)
# trend_nde <- trend_test(beta_hat = summ$coefficients[2:4, "Estimate"],
#                        vcov_hat = ne_mod$vcov[2:4, 2:4],
#                        contrast = contrast_mat,
#                        node = 3, B = 1000)
# set.seed(123)
# trend_nie <- trend_test(beta_hat = summ$coefficients[5:7, "Estimate"],
#                        vcov_hat = ne_mod$vcov[5:7, 5:7],
#                        contrast = contrast_mat,
#                        node = 3, B = 1000)
# 
# ne_trend_test <- base::append(ne_trend_test,
#                              list(all_nde = trend_nde, all_nie = trend_nie))

# Outputs
types <- c("nde", "nie")
groups <- c("g2", "g3", "g4")
res <- data.frame(type = rep(types, each = length(groups)), 
                 group = rep(groups, length(types)), 
                 estimate = NA, se = NA, p = NA,
                 trend_p = NA)
res$estimate <- round(df_summ$Estimate[2:7], 2)
res$se <- round(df_summ$Std..Error[2:7], 2)
res$p <- round(df_summ$Pr...z..[2:7], 3)
res$trend_p[3] <- round(ne_trend_test$all_nde$p_value, 3)
res$trend_p[6] <- round(ne_trend_test$all_nie$p_value, 3)

head(res)
typegroupestimateseptrend_p
1ndeg21.810.640.005NA
2ndeg32.380.610.000NA
3ndeg43.160.680.0000.000
4nieg20.200.170.241NA
5nieg30.290.170.087NA
6nieg40.740.240.0020.001

Additional analysis: treating the exposure as continuous

  • 中介分析使用Natural effects model

    • exposure variables (自变量X): recept_anal
  • mediators (中介变量M): cytokines & gut microbiota
    • outcome variable (因变量Y): HIV-1 seroconversion status
  • adjusted variable (混淆变量): druguse
# Natural effects model
all_mediators <- c(all_species, "cd14", "cd163")
df <- df_causal %>%
  dplyr::select(status, recept_anal, druguse, all_of(all_mediators))
df[is.na(df)] <- 0

t_formula <- as.formula(paste0("status ~ recept_anal + ", 
                              paste0(all_mediators, collapse = " + "), 
                              " + druguse"))
df_exp <- neImpute(t_formula,
                  family = binomial("logit"), nMed = length(all_mediators), data = df)
ne_mod <- neModel(status ~ recept_anal0 + recept_anal1 + druguse,
                 family = binomial("logit"), expData = df_exp, se = "robust")
summ <- summary(ne_mod)
df_summ <- data.frame(summ$coefficients)

df_summ
EstimateStd…Errorz.valuePr…z…
(Intercept)-1.928928730.419607493-4.5969844.286515e-06
recept_anal00.080285330.0409295311.9615504.981487e-02
recept_anal10.008316930.0047855971.7379098.222692e-02
druguseyes1.178525960.4348636542.7101056.726201e-03

Additional analysis: substance usage as the mediator

  • 中介分析使用Natural effects model

    • exposure variables (自变量X): recept_anal

    • mediators (中介变量M): druguse

    • outcome variable (因变量Y): HIV-1 seroconversion status

# Natural effects model
df <- df_causal %>%
  dplyr::select(status, group1, druguse)
df[is.na(df)] <- 0

t_formula <- as.formula(paste0("status ~ group1 + druguse"))
df_exp <- neImpute(t_formula,
                  family = binomial("logit"), data = df)
ne_mod <- neModel(status ~ group10 + group11,
                 family = binomial("logit"), expData = df_exp, se = "robust")
summ <- summary(ne_mod)
df_summ <- data.frame(summ$coefficients)

df_summ
EstimateStd…Errorz.valuePr…z…
(Intercept)-3.013282640.59597082-5.0560914.279374e-07
group10g22.078383970.656684933.1649641.551023e-03
group10g32.715491020.625797994.3392451.429728e-05
group10g43.886784870.706008845.5052923.685567e-08
group11g20.076097050.071852711.0590702.895679e-01
group11g30.180576390.110168341.6390951.011934e-01
group11g40.178175570.124831091.4273331.534839e-01
  • 5
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

生信学习者2

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值