WGCNA复现:小胶质细胞亚群在脑发育时髓鞘形成的作用

Hi大家好,我是老米!生信入门一个月,感谢生信技能树平台。

Homework Response to: 重复一篇WGCNA分析的文章。WGCNA学习教程:一文学会WGCNA分析

复现的文献:A novel microglial subset plays a key role in myelinogenesis in developing brain。EMBO J,10多分的好杂志。

课题背景

小胶质细胞是神经系统中的巨噬细胞,可介导脑稳态和神经炎症。作用至关重要,但在大脑发育中具体机制尚未完全阐明。这篇文章中,通过对比健康成人和炎症激活(构建的实验性自身免疫性髓鞘炎模型EAE)的细胞群,发现新生鼠的小胶质细胞对于髓鞘生成及神经形成起关键作用。其中CD11c +小胶质细胞亚群在发育中大脑的髓鞘区域高表达,介导神经和胶质细胞的存活,迁移和分化。这些细胞是IGF1的主要来源,而选择性耗竭会该亚群可导致性髓鞘形成受损。 引用Jimmy的话:

值得一提的是,在单细胞水平研究小神经胶质细胞(Microglia)动态发育和异质性已经有了不少研究。

  • 波士顿儿童医院的研究者们分析了超过76,000个来自于发育、衰老和脑部感染后的小鼠脑部的小胶质细胞,结果表明至少有9种转录特异的小胶质细胞形态,它们可以表达特定的基因集,且位于特定的脑区。发表于免疫学杂志Immunity, doi:10.1016/j.immuni.2018.11.004 (2019).
  • 斯坦福大学医学院的研究者采用高深度scRNA测序揭示了小胶质细胞和脑髓细胞的发育异质性,发表于Neuron,这些细胞取自于胚胎期、出生后早期和成年的小鼠不同脑区。我们发现大部分的成年小胶质细胞表达稳定的基因(homeostatic genes),且不同脑区间没有差异。相反,出生后早期的小胶质细胞异质性更高。 doi:10.1016/j.neuron.2018.12.006 (2019).
  • 德国弗莱堡大学医学院神经病理学研究所的研究者采用单细胞RNA测序揭示小鼠和人的小神经胶质细胞的空间和时间异质性,成果最近以Letter的形式发表于Nature杂志。doi:10.1038/s41586-019-0924-x (2019).

这些都是当前的脑发育研究的热点和前沿!(Jimmy老师,你知识怎么能这么渊博?我这个搞神经发育的都惭愧【捂脸】)

该文章对五个处理组,共17个样本:

  • orange represents neonatal CD11c+ microglia (n = 4),
  • green neonatal CD11c microglia (n = 4),
  • blue EAE CD11c+ microglia (n = 3),
  • purple EAE CD11c microglia (n = 3),
  • black adult microglia (n = 3).
    其实就是 neonatal(新生的) 和 EAE的Microglia,还有CD11C阳性和阴性,然后和成年小鼠的Microglia进行比较。

的老鼠进行了RNASeq测序,数据在GEO可供下载:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE78809。当然我偷了个懒,该文章Supplemental material提供了整理之后的csv矩阵,大概1万3千个基因。作者进行了WGCNA分析,以此证明小胶质细胞的作用及寻找关键信号通路及基因(嗯,估计这个实验室后续还有大文章出来)。组学及生信分析的篇幅占了整个文章的一半以上。在此,仅重复文章的Figure3。

跌跌撞撞学了WGCNA半个月,大概了解了这个原理,现在还一知半解,对很多细节摸索了很久,头很大。下面进入正题,请大家一起跟着我头大。

1.数据处理

rm(list = ls())
dev.off()
library(WGCNA)
norm.counts <- read.csv("Dataset_EV2.csv",header = T,sep = ",") ##original counts
rownames(norm.counts)<-norm.counts[,1]
norm.counts <- norm.counts[,2:18]
save(norm.counts,file = "norm.counts.Rdata")
## 处理表型信息,就是老鼠的分组信息
a <- colnames(norm.counts)
library(stringr)
str_tmp=as.data.frame(str_split(a,"[_]",simplify = T))
rownames(str_tmp) <- colnames(norm.counts)
colnames(str_tmp) <- c("group","sampleRep")
datTraits <- str_tmp
datTraits$groupNo <- c(rep(1,3),rep(2,4),rep(3,4),rep(4,3),rep(5,3)) 
###数据初步处理完成

## 筛选中位绝对偏差前75%的基因,取MAD大于0.3
## 筛选后会降低运算量,也会失去部分信息
## 也可不做筛选,使MAD大于0即可
norm.counts <- log2(norm.counts)  ##注意我在这里进行了log2,因为我发现如果不log的话,出现了个很有意思的现象,请您往下看
m.mad <- apply(norm.counts,1,mad)
dataExprMad <- norm.counts[which(m.mad > max(quantile(m.mad, probs=seq(0, 1, 0.25))[2],0.3)),] #25%  = 0.22
datExpr0 <- as.data.frame(t(dataExprMad))
##最终取到8222个基因。

###########start handling the data
##看看有没有为空的值,需要剔除。这里下载的都是作者处理好的,基本没问题。
gsg <- goodSamplesGenes(datExpr0,verbose = 3)
gsg$allOK
if (!gsg$allOK){
  # Optionally, print the gene and sample names that were removed:
  if (sum(!gsg$goodGenes)>0)
    printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes],
                                              collapse = ", ")));
  if (sum(!gsg$goodSamples)>0)
    printFlush(paste("Removing samples:",
                     paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
  # Remove the offending genes and samples from the data:
  datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
gsg <- goodSamplesGenes(datExpr0,verbose = 3)
gsg$allOK

###画个树看看样本有没有离群值!!!
if(T){
  # Plot a line to show the cut
  sampleTree <- hclust(dist(datExpr0),method = "average")
  par(cex=.6)
  par(mar=c(0,4,2,0))
  plot(sampleTree, main = "Sample clustering to detect outliers", sub = "",
       xlab = "",cex.lab = 1.5, cex.axis =1.5, cex.main=2
  )
}

##然后保存数据
dim(datExpr0) ##17,8222
datExpr <- datExpr0
nGenes <- ncol(datExpr)
nSamples <- nrow(datExpr)
save(nGenes,nSamples,datExpr,datTraits,file="input.Rdata")

结果如图:
在这里插入图片描述

这结果挺好的,符合实验设计,都聚类好了。但是!但是如果不对原始数值进行log2处理,会出现一个outlier老鼠,如图:
在这里插入图片描述

看到没,那个新生的NEO1号鼠有点调皮!他超脱于所有老鼠之上,统领其他所有老鼠!这要么是问题少年,要么是智商超群,比如科大少年班的那种!而我大概深究了一下,控制这个离群值的大概是500个基因,哈哈哈。。。

2. MDS分析

专业名词叫多维标度法(Multidimensional Scaling)MDS,是一种维数缩减方法,把高维的数据点映射到一个低维的流形上;同时也是一种可视化方法,实践中通常利用2D或3D的MDS 结果观察(投影后)点的分布和聚集来研究数据的性质。说人话叫做:通过基因表达log2组间差异,看看这17个老鼠是否能够分到一个类别,比如成年鼠归到一类,EAE归到一类,新生鼠归到一类。和上面的有点类似,又不完全一样。

参考学习了stat的视频,也参考了他提供的代码。如下:

rm(list = ls())
load(file = 'input.Rdata')
log2.data.matrix <- as.data.frame(t(datExpr))
## now create an empty distance matrix
log2.distance.matrix <- matrix(0,
                               nrow=ncol(log2.data.matrix),
                               ncol=ncol(log2.data.matrix),
                               dimnames=list(colnames(log2.data.matrix),
                                             colnames(log2.data.matrix)))
## now compute the distance matrix using avg(absolute value(log2(FC)))
for(i in 1:ncol(log2.distance.matrix)) {
  for(j in 1:i) {
    log2.distance.matrix[i, j] <-
      mean(abs(log2.data.matrix[,i] - log2.data.matrix[,j]))
  }
}
## do the MDS math (this is basically eigen value decomposition)
## cmdscale() is the function for "Classical Multi-Dimensional Scalign"
mds.stuff <- cmdscale(as.dist(log2.distance.matrix),
                      eig=TRUE,
                      x.ret=TRUE)
save(mds.stuff,file = "step7_MDS.Rdata")
## calculate the percentage of variation that each MDS axis accounts for...
mds.var.per <- round(mds.stuff$eig/sum(mds.stuff$eig)*100, 1)
## now make a fancy looking plot that shows the MDS axes and the variation:
mds.values <- mds.stuff$points
mds.data <- data.frame(Sample=rownames(mds.values),
                       X=mds.values[,1],
                       Y=mds.values[,2])
## 然后画图
library(ggpubr)
library(ggplot2)
groups <- datTraits$group
png("figures/step7_MDS_logfc.png",width = 800,height=600)
ggplot(mds.data, aes(x=X, y=Y, label=Sample,col=groups)) + 
  geom_point(size = 10, alpha = 0.8) +
  theme(panel.grid.minor = element_blank()) +
  coord_fixed() + theme_bw()+
  ggtitle("MDS plot using avg(logFC) as the distance")+
  xlab(paste("Leading logFC dim1 ", mds.var.per[1], "%", sep="")) +
  ylab(paste("Leading logFC dim2 ", mds.var.per[2], "%", sep="")) +
  ggtitle("MDS plot using avg(logFC) as the distance")
dev.off()

结果如下:
在这里插入图片描述

嗯,,,结果表示完美!别问我为什么要先做这个图,因为它好看,我忍不住!

下面正式进入WGCNA分析。

3. 计算beta值

这个没啥好讲的,代码如下:

rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
dim(datExpr)
#################################################
if(T){
        powers = c(c(1:10), seq(from = 12, to=30, by=2))
        # Call the network topology analysis function
        sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
        png("figures/step2-beta-value.png",width = 800,height = 600)
        # Plot the results:
        ##sizeGrWindow(9, 5)
        par(mfrow = c(1,2));
        cex1 = 0.9;
        # Scale-free topology fit index as a function of the soft-thresholding power
        plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
             xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
             main = paste("Scale independence"));
        text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
             labels=powers,cex=cex1,col="red");
        # this line corresponds to using an R^2 cut-off of h
        abline(h=0.9,col="red")
        # Mean connectivity as a function of the soft-thresholding power
        plot(sft$fitIndices[,1], sft$fitIndices[,5],
             xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
             main = paste("Mean connectivity"))
        text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
        dev.off()
}
sft$powerEstimate  ## beta=22 SCI文章里面用了20
save(sft,file = "step2_beta_value.Rdata")

在这里插入图片描述

4. 采用一步法构建加权共表达网络(weight co-expression network)

rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
load(file = "step2_beta_value.Rdata")
enableWGCNAThreads()
if(T){
  net = blockwiseModules(
    datExpr,
    power = sft$powerEstimate,
    maxBlockSize = nGenes,
    TOMType = "unsigned", minModuleSize = 30,
    reassignThreshold = 0, mergeCutHeight = 0.28, ## 注意这个0.28
    numericLabels = TRUE, pamRespectsDendro = FALSE,
    saveTOMs = F,
    verbose = 3
  )
  table(net$colors) 
}

##模块可视化,画那个传说中的树
if(T){
  # Convert labels to colors for plotting
  moduleColors=labels2colors(net$colors)
  table(moduleColors)
  # Plot the dendrogram and the module colors underneath
  png("figures/step3-genes-modules.png",width = 800,height = 600)
  plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]],
                      "Module colors",
                      dendroLabels = FALSE, hang = 0.03,
                      addGuide = TRUE, guideHang = 0.05)
  dev.off()
}
save(net,moduleColors,file = "step3_genes_modules.Rdata")

结果如下:
在这里插入图片描述
这里需要停顿一下:SCI文章只取到8个模块,图注说用了12691个基因(被我发现全部加起来也才9999,没错!9999个基因)。事实上,能够分这么少模块,我不知道作者参数细节。有两个参数特别影响模块大小:一个是minModuleSize,表示每个模块里面最少放多少个基因,这很好理解,设定越大,模块越少;另一个是mergeCutHeight,这个参数表示你在哪里砍树。值设定越小,树枝越多,通常是0.25。google一下,发现这么一个说法:

I am not aware of a principle from which one could derive an “appropriate” value, but in practice, on data sets with 50-100 samples, using 0.15 to 0.2 has worked fairly well. For fewer samples a larger valuse (0.25 to 0.3) may be warranted. If you want larger modules, increase the value; if you want smaller modules at the risk of having redundant modules (modules with very similar functional annotation and very similar fuzzy module membership), you can decrease the value to say 0.10, maybe even lower if you have lots of samples (hundreds or more).

这里,我的mergeCutHeight = 0.28,最终取到13个模块!

5. 模块与基因与表型

5.1 模块和老鼠表型对应

这是关键,其实就是开始解读这些关联的基因群对老鼠表型的影响,比如敲除CDC11之后的老鼠,哪些信号通路激活或抑制,EAE的老鼠哪些基因激活?等等等。

rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")

if(T){
  nGenes = ncol(datExpr)
  nSamples = nrow(datExpr)
  design <- model.matrix(~0+datTraits$group)
  colnames(design)= levels(datTraits$group) ## get the group
  MES0 <- moduleEigengenes(datExpr,moduleColors)$eigengenes
  MEs = orderMEs(MES0)
  moduleTraitCor <- cor(MEs,design,use = "p")
  moduleTraitPvalue <- corPvalueStudent(moduleTraitCor,nSamples)
  textMatrix = paste(signif(moduleTraitCor,2),"\n(",
                     signif(moduleTraitPvalue,1),")",sep = "")
  dim(textMatrix)=dim(moduleTraitCor)
  
  png("figures/step4-Module-trait-relationship.png",width = 800,height = 1200,res = 120)
  par(mar=c(6, 8.5, 3, 3))
  labeledHeatmap(Matrix = moduleTraitCor,
                 xLabels = colnames(design),
                 yLabels = names(MEs),
                 ySymbols = names(MEs),
                 colorLabels = FALSE,
                 colors = blueWhiteRed(50),
                 textMatrix = textMatrix,
                 setStdMargins = FALSE,
                 cex.text = 0.5,
                 zlim = c(-1,1),
                 main = paste("Module-trait relationships"))
  dev.off()
  save(design,file = "step4_design.Rdata")
}

通过模块与各种表型的相关系数,可发现自己感兴趣的模块。如图:
在这里插入图片描述
可以看到有些模块与成年鼠关联,有些与EAE相关性很高。和原SCI的图有点像。

5.2 各模块与表型相关性bar图

就是上面的图转化成bar图。

if(T){
  mes_group <- merge(MEs,datTraits,by="row.names") # 小技巧,可以通过rowname进行merge
  ##写了个画图的function
  library(gplots)
  library(ggplot2)
  library(ggpubr)
  library(grid)
  library(gridExtra) 
  draw_ggboxplot <- function(data,gene="P53",group="group"){
    #print(gene)
    ggboxplot(data,x=group, y=gene,
              ylab = sprintf("Expression of %s",gene),
              xlab = group,
              fill = group,
              palette = "jco",
              #add="jitter",
              legend = "") +stat_compare_means()
  }
  ###开始批量画boxplot
  colorNames = names(MEs)
  png("figures/step4-expression-group.png",width = 800,height=2000,res = 120)
  #par(mfrow=c(ceiling(length(colorNames)/2),2))
  p <- lapply(colorNames,function(x) {
    draw_ggboxplot(mes_group,gene= x,group="group")
  })
  do.call(grid.arrange,c(p,ncol=2))
  dev.off()
}

部分结果如下:
在这里插入图片描述

5.3 模块与基因的相关性

if(T){
  modNames = substring(names(MEs), 3)
  geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"))
  ## 算出每个模块跟基因的皮尔森相关系数矩阵
  ## MEs是每个模块在每个样本里面的值
  ## datExpr是每个基因在每个样本的表达量
  MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))
  names(geneModuleMembership) = paste("MM", modNames, sep="")
  names(MMPvalue) = paste("p.MM", modNames, sep="")
  
  geneTraitSignificance <- as.data.frame(cor(datExpr,datTraits$groupNo,use = "p"))
  GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance),nSamples))
  names(geneTraitSignificance)<- paste("GS.",names(datTraits$group),sep = "")
  names(GSPvalue)<-paste("GS.",names(datTraits$group),sep = "")
  
  #selectModule<-c("blue","green","purple","grey")  ##可以选择自己喜欢的模块
  selectModule <- modNames  ## 也可以批量作图
  png("figures/step4-Module-trait-significance.png",width = 800,height=2000,res = 120)
  par(mfrow=c(ceiling(length(selectModule)/2),2)) #批量作图开始
  for(module in selectModule){
    column <- match(module,selectModule)
    print(module)
    moduleGenes <- moduleColors==module
    verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                       abs(geneTraitSignificance[moduleGenes, 1]),
                       xlab = paste("Module Membership in", module, "module"),
                       ylab = paste("Gene significance for", module, "module"),
                       main = paste("Module membership vs. gene significance\n"),
                       cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)
  }
  dev.off()
}

截取部分如下:

在这里插入图片描述

5.4 关注感兴趣的模块,导出基因进行GO分析

rm(list = ls())
library(WGCNA)
load(file = 'input.Rdata')
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")
load(file = "step4_design.Rdata")
load(file="step4_datTraits.Rdata")

table(moduleColors)
group_g <- data.frame(gene=colnames(datExpr),
                      group=moduleColors)
write.csv(group_g,file = "figures/group_g.csv") ## 导出了对应模块所有基因

# 选择mouse的基因组进行注释及ID转化啥的,如果是人的,另有R包
if(!require("clusterProfiler")) BiocManager::install("clusterProfiler",ask = F,update = F)
library(clusterProfiler)
if(!require("org.Mm.eg.db")) BiocManager::install("org.Mm.eg.db",ask = F,update = F)
library(org.Mm.eg.db)
tmp <- bitr(group_g$gene,fromType = "ENSEMBL",
            toType = "ENTREZID",
            OrgDb = "org.Mm.eg.db")
de_gene_cluster <- merge(tmp,group_g,by.x="ENSEMBL",by.y="gene")
table(de_gene_cluster$group)

###run go analysis
formula_res <- compareCluster(
  ENTREZID~group,
  data = de_gene_cluster,
  fun = "enrichGO",
  OrgDb = "org.Mm.eg.db",
  ont = "BP",
  pAdjustMethod = "BH",
  pvalueCutoff = 0.01,
  qvalueCutoff = 0.05
)

lineage1_ego <-simplify(
  formula_res,
  cutoff=0.5,
  by="p.adjust",
  select_fun=min
)
save(group_g,formula_res,lineage1_ego,file="step5_GOananlysis.Rdata")
#出图
pdf("figures/step5_Microglia_GO_term_DE.pdf",width = 15,height = 15)
dotplot(lineage1_ego,showCategory=10)
dev.off()
write.csv(lineage1_ego@compareClusterResult,
          file="figures/Microglia_GO_term_DE.csv")

GO结果分析如下:
在这里插入图片描述
通过与上面的几个图表配合,发现EAE鼠与炎症通路/免疫相关,而新生鼠的胶质细胞则是cell cycle/胶质细胞分化/神经细胞发育/轴树突形成通路相关。与原SCI分析相符,与预期相符。

5.5 画WGCNA的标配热图

rm(list = ls())
library(WGCNA)
load(file = 'input.Rdata')
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")
load(file = "step4_design.Rdata")
load(file="step4_datTraits.Rdata")
load(file="step5_GOananlysis.Rdata")
# 主要是可视化 TOM矩阵,WGCNA的标准配图
# 然后可视化不同 模块 的相关性 热图
# 不同模块的层次聚类图
# 还有模块诊断,主要是 intramodular connectivity
if(T){
  #geneTree = net$dendrograms[[1]]
  TOM=TOMsimilarityFromExpr(datExpr,power=20)
  dissTOM=1-TOM
  #plotTOM = dissTOM^7
  #diag(plotTOM)=NA
  #TOMplot(plotTOM,geneTree,moduleColors,main="Network heapmap plot of all genes")
  ### 我这里只取了1000个基因哈,我试了一下全部基因,结果跑了半个小时没跑完,被我强行退出!
  nSelect =1000
  set.seed(20)
  select=sample(nGenes,size = nSelect)
  selectTOM = dissTOM[select,select]
  selectTree = hclust(as.dist(selectTOM),method = "average")
  selectColors = moduleColors[select]
  plotDiss=selectTOM^7
  diag(plotDiss)=NA
  png("figures/step6_select_Network-heatmap.png",width = 800,height=600)
  TOMplot(plotDiss,selectTree,selectColors,main="Network heapmap of selected gene")
  dev.off()
}

如下:

在这里插入图片描述

5.7 提取指定模块的基因并做热图

if(T){
  module="turquoise"
  which.module=module
  dat=datExpr[,moduleColors==which.module]
  library(pheatmap)
  pheatmap(dat,show_colnames = F,show_rownames = F)
  n=scale(t(dat+1)) # 'scale'可以对log-ratio数值进行归一化
  n[n>2]=2 
  n[n< -2]= -2
  n[1:4,1:4]
  pheatmap(n,show_colnames =F,show_rownames = F)
  group_list=datTraits$group
  ac=data.frame(g=group_list)
  rownames(ac)=colnames(n)
  png("figures/step6-moduleGene-heatmap.png",width = 800,height = 600)
  pheatmap(n,show_colnames =F,show_rownames = F,annotation_col =ac )
  dev.off()
  
}

在这里插入图片描述
都挺好的。

5.8 性状与模块的关系

if(T){
  ## 连续性的变量,如体重等才好和模块进行比较。
  MEs=moduleEigengenes(datExpr,moduleColors)$eigengenes
  MET = orderMEs(cbind(MEs,datTraits$groupNo))
  par(cex = 0.9)
  png("figures/step6-Eigengene-dendrogram.png",width = 800,height = 600)
  plotEigengeneNetworks(MET, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle
                        = 90)
  dev.off()
}

如图:
在这里插入图片描述

6. 模块导出

感兴趣的模块导出到cytoscape,VisANT等软件进一步进行可视化分析。

#######gene export for VisANT or cytoscape
if(T){
  
  module="turquoise"
  probes = colnames(datExpr)
  inModule = (moduleColors==module)
  modProbes=probes[inModule]
  head(modProbes)
  modTOM = TOM[inModule,inModule]
  dimnames(modTOM)=list(modProbes,modProbes)
  ### 这里只选了top100的基因
  nTop=100
  IMConn = softConnectivity(datExpr[,modProbes])
  top=(rank(-IMConn)<=nTop)
  filterTOM=modTOM[top,top]
  # for visANT
  vis = exportNetworkToVisANT(filterTOM,file = paste("figures/visANTinput-",module,".txt",sep = ""),
                              weighted = T,threshold = 0)
  
  # for cytoscape
  cyt = exportNetworkToCytoscape(filterTOM,
                                 edgeFile = paste("figures/CytoscapeInput-edges-", paste(module, collapse="-"), ".txt", sep=""),
                                 nodeFile = paste("figures/CytoscapeInput-nodes-", paste(module, collapse="-"), ".txt", sep=""),
                                 weighted = TRUE,
                                 threshold = 0.02,
                                 nodeNames = modProbes[top], 
                                 nodeAttr = moduleColors[inModule][top])
}

至此,复现结束。如果你仔细的看到了这里,说明你是想学WGCNA的了。回到课题本身,是否对小胶质细胞的认识更近了一步?提个小小的科学问题:在PD中,是否有一个胶质细胞亚群可对MPP+诱导的黑质神经细胞死亡有抵抗和保护作用?哈哈,别说我是搞肿瘤的了。。。

这些内容,换个思路应该够一个硕士毕业了。。。

写完又是深夜【捂脸】。还在持续学习中,fighting。。。

  • 4
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
Sure, here's some R code to perform the marginal model and plot the locations of significant SNPs: ``` # set seed for reproducibility set.seed(20132014) # simulate data n <- 5000 p <- 1000 h <- c(0.2, 0.8)[1] # simulate genotype (not exactly) x_r <- matrix(rnorm(n * p), ncol = p) xmean <- matrix(rep(colMeans(x_r), n), ncol=p, byrow=TRUE) xsd <- matrix(rep(apply(x_r, 2, sd), n), ncol=p, byrow=TRUE) x <- (x_r - xmean)/xsd # simulate continuous trait y <- sqrt(h) * x[,1] + sqrt(1 - h) * rnorm(n) # split data into training and testing sets train_idx <- sample(1:n, size=round(0.8*n), replace=FALSE) x_train <- x[train_idx,] y_train <- y[train_idx] x_test <- x[-train_idx,] y_test <- y[-train_idx] # perform marginal model library(glmnet) fit <- glmnet(x_train, y_train, family="gaussian") cv_fit <- cv.glmnet(x_train, y_train, family="gaussian") lambda_min <- cv_fit$lambda.min coef <- coef(fit, s=lambda_min) # plot locations of significant SNPs library(ggplot2) library(dplyr) library(tidyr) library(ggrepel) # get SNP names snp_names <- paste0("SNP", 1:p) # create data frame of SNP locations and coefficients snp_df <- data.frame(snp_names, coef[-1]) snp_df <- snp_df %>% gather(key="snp", value="coef", -snp_names) snp_df$significant <- snp_df$coef != 0 # plot SNP locations ggplot(snp_df, aes(x=snp, y=1, color=significant)) + geom_point(size=3) + scale_color_manual(values=c("gray","red")) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + labs(x="SNP", y="") ``` This will produce a plot of SNP locations with significant coefficients highlighted in red. You can adjust the `lambda_min` value to control the level of sparsity in the model and the number of significant SNPs identified.
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值