基于共表达网络和差异表达基因分析水稻干旱和热胁迫响应基因

在这里插入图片描述

对水稻 (Oryza sativa) 的研究对于提高农业生产力和确保全球生计安全至关重要,特别是考虑到极端气候变化造成的日益严重的干旱和热应激。目前,水稻抗旱耐热的基因和机制尚不完全清楚,促进新菌株的开发空间仍然很大。为了准确识别与水稻干旱和热应激反应相关的关键基因,本研究整合了来自基因表达综合 (GEO) 数据库的多个数据集。使用加权相关网络分析 (WGCNA) 算法构建共表达网络。我们进一步区分了核心网络,并将其与差异表达基因和多个表达数据集相交进行筛选。使用定量实时聚合酶链反应 (PCR) 验证基因表达水平的差异。发现 OsDjC53、MBF1C、BAG6、HSP23.2 和 HSP21.9 与热胁迫响应有关,UGT83A1 和 OsCPn60a1 也可能受到干旱胁迫的影响,虽然没有直接关系。这项研究为水稻逆境反应的分子机制提供了重要的见解,这可能促进耐逆水稻品种的发育。

介绍
稻米 (Oryza sativa) 是一种重要的谷物,在世界范围内广泛种植,是全球约 50% 人口的基本食物来源(Ashkani 等人,2015 年;Li et al., 2018)。近几十年来,随着全球总人口的快速增长和对生计保障的需求,水稻遗传学和育种的重要性变得尤为关键(Huang et al., 2016)。专注于高级水稻基因型生成的研究旨在提高水稻植物的产量、质量和对生物和非生物压力源的弹性,如害虫、疾病、盐、干旱和高温(Raj & Nadarajah,2022;Vo et al., 2021)。随着全球气候的恶化,预计许多水稻种植区干旱和热浪的频率和严重程度将增加(Saud 等人,2022 年;Zandalinas, Fritschi & Mittler, 2021)。水稻是一种水生植物,主要生长在经常遭受洪水侵袭的低地地区,使作物更容易受到干旱和热应激的影响(Jagadish, Murty & Quick, 2015;Ji et al., 2012;Sahebi et al., 2018)。

水稻育种和生物技术的进步以及遗传品种改良在提高水稻的抗旱性同时增强其适应炎热环境的能力方面发挥了重要作用(Shen et al., 2022;Wang & Han,2022 年)。对水稻干旱和热适应性的分子机制的研究可以促进创造具有更高抗逆性的新型水稻品种(Kim et al., 2020;Liu et al., 2020)。

已经确定了赋予水稻耐旱性的几个关键基因。UGT85E1 和 OsWRKY5 介导的脱落酸反应增强已被证明可以提高耐旱性(Lim等人,2022 年;Liu et al., 2021)。OsNAR2.1 在硝酸盐吸收和易位中起重要作用;因此,其表达水平与水稻的抗旱性呈正相关(Chen et al., 2019)。OsRINGzf1 在干旱胁迫期间调节水通道蛋白 (Chenet al., 2022)。光合作用相关基因(如 CA1)的表达水平在干旱胁迫下也会发生变化(Auler等人,2021 年;Li et al., 2020)。拟南芥 UBC32 的过表达提高了水稻的耐旱性(Chen等人,2021 年)。这些基因参与各种过程,例如激素信号通路、渗透调节和光合作用。

OsRab7介导的渗透物、抗氧化剂和对非生物胁迫反应的基因的调节可以提高转基因稻米的产量和增强耐热的能力(El-Esawi & Alayafi,2019)。OsTT1 通过消除具有细胞毒性的变性蛋白质并保留细胞中的热反应过程来发挥对热应激的保护作用(Li等人,2015 年);OsNTL3 和 OsbZIP74 具有相似的机制 (Liu et al., 2020)。HES1 在高温胁迫下维持光合系统的稳定性 (Xia et al., 2022)。这些基因与热休克蛋白 (HSP)、抗氧化酶、蛋白质合成和光合作用有关。

总之,对水稻干旱和耐热性的研究对于确保全球粮食安全、适应极端气候变化和提高农业生产力至关重要(Tyczewska et al., 2018)。以前的研究为水稻胁迫反应的生理和分子方面提供了有价值的见解(Lakshmanan et al., 2016)。然而,限制当前文献的一个重大差距是对水稻干旱和热应激反应所涉及的关键基因和调控网络的识别和理解不完整。尽管已经确定了许多胁迫反应基因,但它们只占水稻中大量基因的一小部分。现有研究无法比较这些基因在压力反应中的意义。这限制了我们开发提高水稻品种抗逆性的针对性策略的能力。

为了全面分析水稻干旱和热反应的分子机制,从基因表达综合 (GEO) 数据库中选择了一组 RNA-seq 数据,其中包含干旱和热处理的不同梯度,并将这些数据与多个遭受干旱或热胁迫的数据集中的数据进行比较。不同数据集的整合和先进分析技术的利用使我们能够克服单个研究的局限性,并提供更全面的水稻胁迫反应分子机制视图。本研究增强了我们对水稻干旱和热应激适应的分子机制的理解,并可用于发现可以作为遗传育种候选者的新基因和更重要的基因。本文的部分内容以前作为预印本 (https://doi.org/10.21203/rs.3.rs-3047406/v1) 的一部分发布。

材料和方法
数据采集
从 GEO 数据库 (https://www.ncbi.nlm.nih.gov/geo/) 中寻找和检索了多个基因表达谱数据集,包括高通量测序 (Illumina HiSeq 2000/Illumina HiSeq 4000/Illumina NovaSeq 6000) 和芯片数据集 (Affymetrix Rice Genome Array Platforms)。这些高通量测序数据集包括 GSE221542、 GSE168650 (Kan et al., 2022) 和 GSE159816 (Zu et al., 2021)。这些阵列数据集包括 GSE136746 (Ps et al., 2017)、GSE41648 (Sharma et al., 2021)、GSE14275 (胡, 胡 & Han, 2009)、GSE93917 (Wang et al., 2020) 和 GSE83378 (Wei et al., 2017) (表 1)。这些 GEO 数据集的基因符号使用国家生物技术信息中心 (NCBI)、水稻注释项目数据库 (RAP-db) (https://rapdb.dna.affrc.go.jp/) 和水稻基因组注释项目 (http://rice.uga.edu/index.shtml) 进行注释。使用 R ( 版本 4.2.3 ) 和 RStudio (版本 2023.03.0) 软件处理数据。GSE221542包含 15 个样品,包括 3 个水水平和 2 个热水平,每个样品有 3 个重复样品。

干旱/高温响应基因加权基因共表达网络分析
使用 R 包“edgeR”计算每百万计数以标准化 RNA-seq 数据的测序深度(Robinson、McCarthy & Smyth,2010 年)。使用R(版本4.2.3)中的加权基因共表达网络分析(WGCNA)(Langfelder & Horvath,2008)包,使用以下步骤构建了aco表达网络。首先,计算不同干旱或热应激水平下每个基因的平均表达量,并过滤掉表达量没有变化的基因。其次,将基因表达水平标准化到 0-1 的范围内,然后计算 Pearson 相关系数,该系数用于衡量基因之间共表达的相似性。第三,为了确保无标度的网络分布,为邻接矩阵权重选择合适的 beta 值,以构建用于模块聚类和分割的拓扑重叠矩阵。最后,为了选择与干旱或热响应相关的模块,分析了每个网络模块与样本表型之间的关系。

基因本体论 (GO) 术语用于富集选定的基因 (Tian et al., 2017)。使用 R 包 “clusterProfiler” 表示分析结果以进行可视化 (Yu et al., 2012)。京都基因与基因组百科全书(KEGG)富集(Kanehisa & Goto,2000)也使用R包“clusterProfiler”(Yu等人,2012)进行了分析。使用 CytoScape (3.9.1) 的 CytoHubba (Chin et al., 2014) 插件,基于最短路径,使用最大团中心性 (MCC) 方法对关键模块的每个基因进行评分,并选择前 20 个枢纽基因。


getwd()
data1 = read.csv("logCPM.csv", check.names=F)
library(WGCNA)
datExpr0 = as.data.frame(t(data1[,-1]))
names(datExpr0) = data1[,1]
rownames(datExpr0) = names(data1[,-1])
datExpr0
gsg = goodSamplesGenes(datExpr0, verbose = 3)
gsg$allOK
if (!gsg$allOK)
{
  # Optionally, print the gene and sample names that were removed:
  if (sum(!gsg$goodGenes)>0)
    printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")))
  if (sum(!gsg$goodSamples)>0)
    printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")))
  # Remove the offending genes and samples from the data:
  datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
meanFPKM=0.5  
n=nrow(datExpr0)
datExpr0[n+1,]=apply(datExpr0[c(1:nrow(datExpr0)),],2,mean)
datExpr0=datExpr0[1:n,datExpr0[n+1,] > meanFPKM]  # for meanFpkm in row n+1 and it must be above what you set--select meanFpkm>opt$meanFpkm(by rp)
filtered_fpkm=t(datExpr0)
filtered_fpkm=data.frame(rownames(filtered_fpkm),filtered_fpkm)
names(filtered_fpkm)[1]="sample"
head(filtered_fpkm)
write.table(filtered_fpkm, file="logCPM_filter.xls",row.names=F, col.names=T,quote=FALSE,sep="\t")
sampleTree = hclust(dist(datExpr0), method = "average")
pdf(file = "1_sampleClustering.pdf", width = 12, height = 4)
par(cex = 0.6)
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
     cex.axis = 1.5, cex.main = 2)
abline(h = 100, col = "red")
dev.off()
clust = cutreeStatic(sampleTree, cutHeight = 100, minSize = 1)
table(clust)
keepSamples = (clust<=2)
datExpr0 = datExpr0[keepSamples, ]
traitData = read.table("character.txt",row.names=1,header=T,comment.char = "",check.names=F)
dim(traitData)
names(traitData)
allTraits = traitData
dim(allTraits)
names(allTraits)
fpkmSamples = rownames(datExpr0)
traitSamples =rownames(allTraits)
traitRows = match(fpkmSamples, traitSamples)
datTraits = allTraits[traitRows,]
rownames(datTraits) 
collectGarbage()
sampleTree2 = hclust(dist(datExpr0), method = "average")
traitColors = numbers2colors(datTraits, signed = FALSE)
pdf(file="2_Sample dendrogram and trait heatmap.pdf",width=12,height=6)
plotDendroAndColors(sampleTree2, traitColors,
                    groupLabels = names(datTraits),
                    main = "Sample dendrogram and trait heatmap")
dev.off()
save(datExpr0, file = "logCPM_forAnalysis.RData")
save(datTraits, file="trait_forAnalysis.RData")
enableWGCNAThreads()
powers = c(c(1:10), seq(from = 12, to=30, by=2))
sft=pickSoftThreshold(datExpr0,dataIsExpr = TRUE,
                      
                      powerVector = powers,
                      
                      corFnc = cor,
                      
                      corOptions = list(use = 'p'),
                      
                      networkType = 'unsigned') 

sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5)

pdf(file="3_Scale independence.pdf",width=9,height=5)
par(mfrow = c(1,2))
cex1 = 0.8
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
abline(h=0.8,col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1, col ="red")
abline(h=200,col="red")

dev.off()
softPower =sft$powerEstimate
adjacency = adjacency(datExpr0, power = 30)
TOM = TOMsimilarity(adjacency);
dissTOM = 1-TOM
save(TOM, file = "TOM_forAnalysis.RData")
geneTree = hclust(as.dist(dissTOM), method = "average");
pdf(file="4_Gene clustering on TOM-based dissimilarity.pdf",width=12,height=9)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
     labels = FALSE, hang = 0.04)
dev.off()
minModuleSize = 100
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                            deepSplit = 2, pamRespectsDendro = FALSE,
                            minClusterSize = minModuleSize);
table(dynamicMods)
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
pdf(file="5_Dynamic Tree Cut.pdf",width=8,height=6)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05,
                    main = "Gene dendrogram and module colors")
dev.off()
MEList = moduleEigengenes(datExpr0, colors = dynamicColors)
MEs = MEList$eigengenes
MEDiss = 1-cor(MEs);
METree = hclust(as.dist(MEDiss), method = "average")
pdf(file="6_Clustering of module eigengenes.pdf",width=7,height=6)
plot(METree, main = "Clustering of module eigengenes",
     xlab = "", sub = "")
MEDissThres = 0.28
abline(h=MEDissThres, col = "red")
dev.off()
merge = mergeCloseModules(datExpr0, dynamicColors, cutHeight = MEDissThres, verbose = 3)
mergedColors = merge$colors
mergedMEs = merge$newMEs
pdf(file="7_merged dynamic.pdf", width = 9, height = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
                    c("Dynamic Tree Cut", "Merged dynamic"),
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)
dev.off()
moduleColors = mergedColors
colorOrder = c("grey", standardColors(50))
moduleLabels = match(moduleColors, colorOrder)-1
MEs = mergedMEs
save(MEs, TOM, dissTOM,  moduleLabels, moduleColors, geneTree, sft, file = "networkConstruction-stepByStep.RData")
nGenes = ncol(datExpr0)
nSamples = nrow(datExpr0)
moduleTraitCor = cor(MEs, datTraits, use = "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)
pdf(file="8_Module-trait relationships.pdf",width=4.5,height=10)
textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
                   signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3))
labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = names(datTraits),
               yLabels = names(MEs),
               ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = blueWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))
dev.off()
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"))
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))
names(geneModuleMembership) = paste("MM", modNames, sep="")
names(MMPvalue) = paste("p.MM", modNames, sep="")
traitNames=names(datTraits)
geneTraitSignificance = as.data.frame(cor(datExpr0, datTraits, use = "p"))
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples))
names(geneTraitSignificance) = paste("GS.", traitNames, sep="")
names(GSPvalue) = paste("p.GS.", traitNames, sep="")
module="royalblue"
column = match(module, modNames)
moduleGenes = moduleColors==module
trait="AKI"
traitColumn=match(trait,traitNames)
par(mfrow = c(1,1))
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                   abs(geneTraitSignificance[moduleGenes, traitColumn]),
                   xlab = paste("Module Membership in", module, "module"),
                   ylab = paste("Gene significance for ",trait),
                   main = paste("Module membership vs. gene significance\n"),
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)
workingDir = "D:/Rdata/20230324WGCNA"

for (trait in traitNames){
  traitColumn=match(trait,traitNames)
  
  for (module in modNames){
    column = match(module, modNames)
    moduleGenes = moduleColors==module
    
    if (nrow(geneModuleMembership[moduleGenes,]) > 1){
      pdf(file=paste("9_", trait, "_", module,"_Module membership vs gene significance.pdf",sep=""),width=7,height=7)
      par(mfrow = c(1,1))
      verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                         abs(geneTraitSignificance[moduleGenes, traitColumn]),
                         xlab = paste("Module Membership in", module, "module"),
                         ylab = paste("Gene significance for ",trait),
                         main = paste("Module membership vs. gene significance\n"),
                         cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)
      dev.off()
    }
  }
}

workingDir = "D:/Rdata/20230324WGCNA/WGCNA"
names(datExpr0)
probes = names(datExpr0)
geneInfo0 = data.frame(probes= probes,
                       moduleColor = moduleColors)

for (Tra in 1:ncol(geneTraitSignificance))
{
  oldNames = names(geneInfo0)
  geneInfo0 = data.frame(geneInfo0, geneTraitSignificance[,Tra],
                         GSPvalue[, Tra])
  names(geneInfo0) = c(oldNames,names(geneTraitSignificance)[Tra],
                       names(GSPvalue)[Tra])
}

for (mod in 1:ncol(geneModuleMembership))
{
  oldNames = names(geneInfo0)
  geneInfo0 = data.frame(geneInfo0, geneModuleMembership[,mod],
                         MMPvalue[, mod])
  names(geneInfo0) = c(oldNames,names(geneModuleMembership)[mod],
                       names(MMPvalue)[mod])
}
geneOrder =order(geneInfo0$moduleColor)
geneInfo = geneInfo0[geneOrder, ]

write.table(geneInfo, file = "10_GS_and_MM.xls",sep="\t",row.names=F)
nGenes = ncol(datExpr0)
nSamples = nrow(datExpr0)
plotTOM = dissTOM^7
diag(plotTOM) = NA
sizeGrWindow(9,9)
pdf(file="12_Network heatmap plot_all gene.pdf",width=9, height=9)
TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")
dev.off()
nSelect = 400
set.seed(10)
select = sample(nGenes, size = nSelect)
selectTOM = dissTOM[select, select]
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select]
plotDiss = selectTOM^7
diag(plotDiss) = NA
pdf(file="13_Network heatmap plot_selected genes.pdf",width=9, height=9)
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")
dev.off()
pdf(file="14_Eigengene dendrogram and Eigengene adjacency heatmap.pdf", width=5, height=7.5)
par(cex = 0.9)
plotEigengeneNetworks(MEs, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle= 90)
dev.off()
pdf(file="15_Eigengene dendrogram_2.pdf",width=6, height=6)
par(cex = 1.0)
plotEigengeneNetworks(MEs, "Eigengene dendrogram", marDendro = c(0,4,2,0), plotHeatmaps = FALSE)
dev.off()
pdf(file="15_Eigengene adjacency heatmap_2.pdf",width=6, height=6)
par(cex = 1.0)
plotEigengeneNetworks(MEs, "Eigengene adjacency heatmap", marHeatmap = c(3,4,2,2), plotDendrograms = FALSE, xLabelsAngle = 90)
dev.off()
for (mod in 1:nrow(table(moduleColors)))
{
  modules = names(table(moduleColors))[mod]
  probes = names(datExpr0)
  inModule = (moduleColors == modules)
  modProbes = probes[inModule]
  modGenes = modProbes
  modTOM = TOM[inModule, inModule]
  dimnames(modTOM) = list(modProbes, modProbes)
  cyt = exportNetworkToCytoscape(modTOM,
                                 edgeFile = paste("CytoscapeInput-edges-", modules , ".txt", sep=""),
                                 nodeFile = paste("CytoscapeInput-nodes-", modules, ".txt", sep=""),
                                 weighted = TRUE,
                                 threshold = 0.02,
                                 nodeNames = modProbes,
                                 altNodeNames = modGenes,
                                 nodeAttr = moduleColors[inModule])
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值