这期介绍一个基于TCGA和GTEx数据挖掘神器(GEPIA2),个人觉得如果没有编程基础的可以直接利用这个在线小工具分析自己的研究的单个基因或者多个基因,效果还是蛮好的!
桓峰基因公众号推出转录组分析教程,有需要生信的老师可以联系我们!转录分析教程整理如下:
RNA 2. SCI文章中基于GEO的差异表达基因之 limma
RNA 3. SCI 文章中基于T CGA 差异表达基因之 DESeq2
RNA 4. SCI 文章中基于TCGA 差异表达之 edgeR
RNA 6. 差异基因表达之-- 火山图 (volcano)
RNA 7. SCI 文章中的基因表达——主成分分析 (PCA)
RNA 8. SCI文章中差异基因表达--热图 (heatmap)
RNA 12. SCI 文章中肿瘤免疫浸润计算方法之 CIBERSORT
RNA 14. SCI 文章中差异表达基因之 蛋白互作网络 (PPI)
RNA 15. SCI 文章中的融合基因之 FusionGDB2
RNA 17. SCI 文章中的筛选 Hub 基因 (Hub genes)
RNA 19. SCI 文章中无监督聚类法 (ConsensusClusterPlus)
RNA 20. SCI 文章中单样本免疫浸润分析 (ssGSEA)
RNA 22. SCI 文章中基于表达估计恶性肿瘤组织的基质细胞和免疫细胞(ESTIMATE)
RNA 23. SCI文章中表达基因模型的风险因子关联图(ggrisk)
RNA 24. SCI文章中基于TCGA的免疫浸润细胞分析 (TIMER)
RNA 25. SCI文章中估计组织浸润免疫细胞和基质细胞群的群体丰度(MCP-counter)
RNA 26. SCI文章中基于转录组数据的基因调控网络推断 (GENIE3)
RNA 27 SCI文章中转录因子结合motif富集到调控网络 (RcisTarget)
FigDraw 28. SCI文章中绘制雷达图/蛛网图 (RadarChart)
RNA 29. SCI文章中基于TCGA的免疫浸润细胞分析 (TIMER2.0)
GEPIA2是GEPIA的更新版本,用于分析TCGA和GTEx项目中9736个肿瘤和8587个正常样本的RNA测序表达数据,使用标准处理管道。GEPIA2提供可定制的功能,如肿瘤/正常差异表达分析、根据癌症类型或病理分期进行分析、患者生存分析、相似基因检测、相关性分析和降维分析。(http://gepia2.cancer-pku.cn/)
GEPIA2使用的RNA-Seq数据集基于UCSC Xena项目(http://xena.ucsc.edu),通过标准管道计算。
GEPIA有四个模块,都可以处理数据:
Single Gene Analysis
Cancer Type Analysis
Custom Data Analysis
Multiple Gene Analysis
这个在线工具特别适合做单基因研究想发文章的需求,简单好用,输入一个基因就可以看到泛癌中的变化情况!
我们就看下给出来的例子,都能做哪些分析以及最后得到哪些结果:
Examples for GEPIA2 Usage
By using GEPIA2, experimental biologists can easily explore the large TCGA and GTEx datasets, ask specific questions, and test their hypotheses in a higher resolution.
For the isoform analysis in boxplot
and survival
analyses, users can easily get the result that POMT1-003 isoform in ACC cancer type was over expressed compared with the normal tissue. Meanwhile, given the high expression of POMT1-003 isoform, the patients in ACC had a worse prognostic outcome.
In addition, based on the Isoform Usage
, users can find that SLC7A2-202 in SLC7A2 gene has a isoform switch event in LIHC compared with other cancer types.
Users also can use Isoform Structure
find that 3 isoforms in ERCC1 have different isoform structures.
For Survival Map
, users can get the survival significance map of gene HSPB6, which have significant results in BLCA, KIRP, LGG and SARC.
For gene signature analysis in similar genes detection
, users can find that MIR155HG, CD8A, IL21R, CD27 and PTPN7 have highest correlation with T-cell exhausted signature in LIHC cancer type.
For the combination of signature and subtype analysis in boxplot
, GEPIA2 provides the expression distribution of Th-1 like signature in the 3 COAD subtypes.
For analyzing the user-upload data, the features in custom data analysis
enables users classify their uploaded data into cancer subtype or compare their own data with TCGA and GTEx data.
For doing the analyses in the local machine, GEPIA2 provides the python package gepia in API
. Users can get the batch of analysis results using this package.
GEPIA2 also retained the original features of GEPIA:
In differential analysis
and expression profile
, users can easily discover differentially expressed genes, such as MPO in leukemia and UPK2 in bladder cancer.
MPO specifically expressed in leukemia:
UPK2 specifically expressed in bladder cancer:
The chromosomal distribution of over- or under- expressed genes can be plotted in Differential Genes
.
Over-expressed genes:
Under-expressed genes:
Both over-expressed and under-expressed genes:
In Survival
analysis, genes with the most significant association with patient survival can be identified, such as MCTS1 in breast cancer and HILPDA in liver cancer. Code
MCTS1 in breast cancer
HILPDA in liver cancer:
Gene expression is visualized by both a bodymap and a bar plot in General
.
Gene expression by pathological stage is plotted in Stage plot
. Code
Users can compare the expression of one gene in multiple cancers by Boxplot
, or compare multiple genes by a matrix plot in Multiple gene comparison
. Code
Boxplot:
Matrix plot:
GEPIA provides pair-wise gene correlation
analysis of a given set of TCGA and/or GTEx expression data. Normalization is optional and customizable. Code
GEPIA provides Principal Component Analysis of multiple genes and cancer types in PCA
, and presents results by 2D or 3D plots.
2D plots:
3D plots:
Variances distribution:
Genes with similar expression pattern can be identified in Similar Genes
, for example, PGAP3 and GRB7 are similar to ERBB2.
ERBB2:
PGAP3:
GRB7:
使用起来还是非常方便,避免了自己写代码,又找数据又作图,有需要的老师可以参考使用!
桓峰基因,铸造成功的您!
未来桓峰基因公众号将不间断的推出转录组系列生信分析教程,
敬请期待!!
有想进生信交流群的老师可以扫最后一个二维码加微信,备注“单位+姓名+目的”,有些想发广告的就免打扰吧,还得费力气把你踢出去!
References:
1. Chenwei Li, Zefang Tang, Wenjie Zhang, Zhaochen Ye, Fenglin Liu, GEPIA2021: integrating multiple deconvolution-based analysis into GEPIA, Nucleic Acids Research, Volume 49, Issue W1, 2 July 2021, Pages W242–W246, https://doi.org/10.1093/nar/gkab418
2. Tang, Z. et al. (2019) GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res, 10.1093/nar/gkz430.