WGCNA分析之一-------理清概念

最新推荐文章于 2024-09-04 15:06:41 发布

JasonKQLin

最新推荐文章于 2024-09-04 15:06:41 发布

阅读量1.7k

点赞数 1

分类专栏：生物信息 R

本文链接：https://blog.csdn.net/linkequa/article/details/112326418

版权

R 同时被 2 个专栏收录

24 篇文章 2 订阅

订阅专栏

生物信息

23 篇文章 1 订阅

订阅专栏

WGCNA是一种用于基因表达数据的分析方法，旨在通过识别共表达基因模块来揭示基因间的相互作用。这种方法有助于发现hub基因，研究基因模块与疾病状态的关系，并且对样本数量有一定要求，至少需要15个样本。输入数据通常为RPKM、FPKM或标准化计数值，需消除批次效应并确保样本间量化一致。基因过滤建议基于平均表达值或中位绝对偏差，而非差异表达倍数，以保持分析的无监督性质。

摘要由CSDN通过智能技术生成

1，定义

WGCNA即Weighted gene co-expression network analysis，加权基因共表达网络分析。

2，有什么用

2.1 将共表达的一组基因放在一起研究，可以得到比单个上调、下调基因更多的信息；
2.2 鉴定"hub gene"（即与其它基因关系密切的基因、处于中心位置的基因、有重要作用的基因）；
2.3 探究基因模块（一组共表达的基因）与性状（疾病状态）之间的关系。

3，输入数据的格式

RPKM，FPKM和标准化之后的counts值等等都可以。但必须是以样本为单位进行normalize之后的结果。

Whether one uses RPKM, FPKM, or simply normalized counts doesn’t make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

If data come from different batches, we recommend to check for batch effects and, if needed, adjust for them. We use ComBat for batch effect removal but other methods should also work.

Finally, we usually check quantile scatterplots to make sure there are no systematic shifts between samples; if sample quantiles show correlations (which they usually do), quantile normalization can be used to remove this effect.

4，样本数要求

不低于15个样本

5，怎样对基因进行过滤

建议使用平均表达值或中位绝对偏差对基因进行过滤（去掉表达值低的基因或者去掉方差小的基因），不建议使用差异表达倍数进行过滤。

Probesets or genes may be filtered by mean expression or variance (or their robust analogs such as median and median absolute deviation, MAD) since low-expressed or non-varying genes usually represent noise. Whether it is better to filter by mean expression or variance is a matter of debate; both have advantages and disadvantages, but more importantly, they tend to filter out similar sets of genes since mean and variance are usually related.

We do not recommend filtering genes by differential expression. WGCNA is designed to be an unsupervised analysis method that clusters genes based on their expression profiles. Filtering genes by differential expression will lead to a set of correlated genes that will essentially form a single (or a few highly correlated) modules. It also completely invalidates the scale-free topology assumption, so choosing soft thresholding power by scale-free topology fit will fail.

Reference

https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html
https://deneflab.github.io/HNA_LNA_productivity/WGCNA_analysis.html#1_data_inputcleaning