clusterprolifer富集分析的结果查看 并可视化

clusterprolifer富集分析的结果查看 并可视化

https://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-go.html

6 GO enrichment analysis
GO comprises three orthogonal ontologies, i.e. molecular function (MF), biological process (BP), and cellular component (CC).

6.1 Supported organisms
GO analyses (groupGO(), enrichGO() and gseGO()) support organisms that have an OrgDb object available (see also session 2.2).

If a user has GO annotation data (in a data.frame format with the first column as gene ID and the second column as GO ID), they can use the enricher() and gseGO() functions to perform an over-representation test and gene set enrichment analysis.

If the genes are annotated by direction annotation, they should also be annotated by their ancestor GO nodes (indirect annotation). If a user only has direct annotation, they can pass their annotation to the buildGOmap function, which will infer indirect annotation and generate a data.frame that is suitable for both enricher() and gseGO().

6.2 GO classification
In clusterProfiler, the groupGO() function is designed for gene classification based on GO distribution at a specific level. Here we use the dataset geneList provided by DOSE.

library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

# Entrez gene ID
head(gene)
## [1] "4312"  "8318"  "10874" "55143" "55388" "991"
ggo <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "CC",
               level    = 3,
               readable = TRUE)

head(ggo)
##                    ID                Description Count GeneRatio geneID
## GO:0000133 GO:0000133                 polarisome     0     0/207       
## GO:0000408 GO:0000408          EKC/KEOPS complex     0     0/207       
## GO:0000417 GO:0000417                HIR complex     0     0/207       
## GO:0000444 GO:0000444    MIS12/MIND type complex     0     0/207       
## GO:0000808 GO:0000808 origin recognition complex     0     0/207       
## GO:0000930 GO:0000930      gamma-tubulin complex     0     0/207
The gene parameter is a vector of gene IDs (can be any ID type that is supported by the corresponding OrgDb, see also session 16.1). If readable is set to TRUE, the input gene IDs will be converted to gene symbols.

6.3 GO over-representation analysis
The clusterProfiler package implements enrichGO() for gene ontology over-representation test.

ego <- enrichGO(gene          = gene,
                universe      = names(geneList),
                OrgDb         = org.Hs.eg.db,
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05,
        readable      = TRUE)
head(ego)
##                    ID                              Description GeneRatio
## GO:0005819 GO:0005819                                  spindle    26/201
## GO:0000779 GO:0000779 condensed chromosome, centromeric region    16/201
## GO:0072686 GO:0072686                          mitotic spindle    17/201
## GO:0000775 GO:0000775           chromosome, centromeric region    18/201
## GO:0098687 GO:0098687                       chromosomal region    23/201
## GO:0000776 GO:0000776                              kinetochore    15/201
##              BgRatio       pvalue     p.adjust       qvalue
## GO:0005819 306/11853 1.072029e-11 3.151766e-09 2.888837e-09
## GO:0000779 114/11853 7.709944e-11 8.659125e-09 7.936756e-09
## GO:0072686 133/11853 8.835841e-11 8.659125e-09 7.936756e-09
## GO:0000775 158/11853 1.684987e-10 1.179661e-08 1.081250e-08
## GO:0098687 272/11853 2.006225e-10 1.179661e-08 1.081250e-08
## GO:0000776 106/11853 2.733425e-10 1.339378e-08 1.227644e-08
##                                                                                                                                                               geneID
## GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/BIRC5/KIF11/TRAT1/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
## GO:0000779                                                             CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CCNB1
## GO:0072686                                                      KIF23/CENPE/ASPM/SKA1/NUSAP1/TPX2/TACC3/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/AURKB/PRC1/KIFC1/KIF18B/AURKA
## GO:0000775                                                 CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CCNB1
## GO:0098687                   CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/RAD51AP1/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CHEK1/CCNB1/MCM5
## GO:0000776                                                                   CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/AURKB/CCNB1
##            Count
## GO:0005819    26
## GO:0000779    16
## GO:0072686    17
## GO:0000775    18
## GO:0098687    23
## GO:0000776    15
Any gene ID type that is supported in OrgDb can be directly used in GO analyses. Users need to specify the keyType parameter to specify the input gene ID type.

gene.df <- bitr(gene, fromType = "ENTREZID",
        toType = c("ENSEMBL", "SYMBOL"),
        OrgDb = org.Hs.eg.db)

ego2 <- enrichGO(gene         = gene.df$ENSEMBL,
                OrgDb         = org.Hs.eg.db,
                keyType       = 'ENSEMBL',
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05)
head(ego2, 3)                
##                    ID                              Description GeneRatio
## GO:0005819 GO:0005819                                  spindle    30/233
## GO:0072686 GO:0072686                          mitotic spindle    21/233
## GO:0000779 GO:0000779 condensed chromosome, centromeric region    17/233
##              BgRatio       pvalue     p.adjust       qvalue
## GO:0005819 422/21916 1.954166e-16 5.413039e-14 4.175744e-14
## GO:0072686 179/21916 3.911063e-16 5.416822e-14 4.178662e-14
## GO:0000779 154/21916 7.525516e-13 6.948560e-11 5.360280e-11
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     geneID
## GO:0005819 ENSG00000134690/ENSG00000117399/ENSG00000137807/ENSG00000138778/ENSG00000066279/ENSG00000126787/ENSG00000154839/ENSG00000262634/ENSG00000137804/ENSG00000088325/ENSG00000013810/ENSG00000117650/ENSG00000170312/ENSG00000164109/ENSG00000121621/ENSG00000089685/ENSG00000138160/ENSG00000163519/ENSG00000112742/ENSG00000178999/ENSG00000198901/ENSG00000237649/ENSG00000233450/ENSG00000056678/ENSG00000204197/ENSG00000186185/ENSG00000112984/ENSG00000087586/ENSG00000134057/ENSG00000090889
## GO:0072686                                                                                                                                                 ENSG00000137807/ENSG00000138778/ENSG00000066279/ENSG00000154839/ENSG00000262634/ENSG00000137804/ENSG00000088325/ENSG00000013810/ENSG00000170312/ENSG00000164109/ENSG00000121621/ENSG00000138160/ENSG00000163519/ENSG00000178999/ENSG00000198901/ENSG00000237649/ENSG00000233450/ENSG00000056678/ENSG00000204197/ENSG00000186185/ENSG00000087586
## GO:0000779                                                                                                                                                                                                                 ENSG00000138778/ENSG00000080986/ENSG00000123485/ENSG00000154839/ENSG00000262634/ENSG00000117650/ENSG00000100162/ENSG00000166451/ENSG00000186871/ENSG00000164109/ENSG00000121621/ENSG00000167513/ENSG00000089685/ENSG00000112742/ENSG00000109805/ENSG00000178999/ENSG00000134057
##            Count
## GO:0005819    30
## GO:0072686    21
## GO:0000779    17
Gene IDs can be mapped to gene Symbols by using the parameter readable=TRUE or setReadable() function.

6.4 GO Gene Set Enrichment Analysis
The clusterProfiler package provides the gseGO() function for gene set enrichment analysis using gene ontology.

ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "CC",
              minGSSize    = 100,
              maxGSSize    = 500,
              pvalueCutoff = 0.05,
              verbose      = FALSE)
The format of input data, geneList, was documented in the FAQ. Beware that only gene Set size in [minGSSize, maxGSSize] will be tested.

6.5 GO analysis for non-model organisms
Both the enrichGO() and gseGO() functions require an OrgDb object as the background annotation. For organisms that don’t have OrgDb provided by Bioconductor, users can query one (if available) online via AnnotationHub. If there is no OrgDb available, users can obtain GO annotation from other sources, e.g. from biomaRt or Blast2GO. Then the enricher() or GSEA() functions can be used to perform GO analysis for these organisms, similar to the examples using wikiPathways and MSigDB. Another solution is to create an OrgDb on your own using AnnotationForge package.

6.6 Visualize enriched GO terms as a directed acyclic graph
The goplot() function can accept the output of enrichGO and visualize the enriched GO induced graph.

goplot(ego)
Goplot of enrichment analysis.
Figure 6.1: Goplot of enrichment analysis.

6.7 Summary
GO semantic similarity can be calculated by GOSemSim (Yu et al. 2010). We can use it to cluster genes/proteins into different clusters based on their functional similarity and can also use it to measure the similarities among GO terms to reduce the redundancy of GO enrichment results.

clusterprolifer富集分析的结果查看 并可视化

16 dplyr verbs for manipulating enrichment result
library(DOSE)
data(geneList)
de = names(geneList)[1:100]
x = enrichDO(de)
16.1 filter
filter(x, p.adjust < .05, qvalue < 0.2)
## #
## # over-representation test
## #
## #...@organism     Homo sapiens 
## #...@ontology     DO 
## #...@keytype      ENTREZID 
## #...@gene     chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05 
## #...28 enriched terms found
## 'data.frame':    28 obs. of  9 variables:
##  $ ID         : chr  "DOID:0060071" "DOID:5295" "DOID:8719" "DOID:3007" ...
##  $ Description: chr  "pre-malignant neoplasm" "intestinal disease" "in situ carcinoma" "breast ductal carcinoma" ...
##  $ GeneRatio  : chr  "5/77" "9/77" "4/77" "4/77" ...
##  $ BgRatio    : chr  "22/8007" "157/8007" "18/8007" "29/8007" ...
##  $ pvalue     : num  1.67e-06 1.76e-05 2.18e-05 1.56e-04 2.08e-04 ...
##  $ p.adjust   : num  0.00064 0.00279 0.00279 0.0136 0.0136 ...
##  $ qvalue     : num  0.000461 0.002008 0.002008 0.009796 0.009796 ...
##  $ geneID     : chr  "6280/6278/10232/332/4321" "4312/6279/3627/10563/4283/890/366/4902/3620" "6280/6278/10232/332" "6280/6279/4751/6286" ...
##  $ Count      : int  5 9 4 4 13 6 13 5 5 6 ...
## #...Citation
##   Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
##   R/Bioconductor package for Disease Ontology Semantic and Enrichment
##   analysis. Bioinformatics 2015, 31(4):608-609
16.2 arrange
mutate(x, geneRatio = parse_ratio(GeneRatio)) %>%
  arrange(desc(geneRatio))
## #
## # over-representation test
## #
## #...@organism     Homo sapiens 
## #...@ontology     DO 
## #...@keytype      ENTREZID 
## #...@gene     chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05 
## #...28 enriched terms found
## 'data.frame':    28 obs. of  10 variables:
##  $ ID         : chr  "DOID:3908" "DOID:120" "DOID:2394" "DOID:3459" ...
##  $ Description: chr  "non-small cell lung carcinoma" "female reproductive organ cancer" "ovarian cancer" "breast carcinoma" ...
##  $ GeneRatio  : chr  "13/77" "13/77" "10/77" "10/77" ...
##  $ BgRatio    : chr  "431/8007" "455/8007" "312/8007" "383/8007" ...
##  $ pvalue     : num  2.08e-04 3.52e-04 7.54e-04 3.48e-03 1.76e-05 ...
##  $ p.adjust   : num  0.0136 0.01928 0.02625 0.04756 0.00279 ...
##  $ qvalue     : num  0.0098 0.01389 0.0189 0.03424 0.00201 ...
##  $ geneID     : chr  "6280/2305/9133/6279/7153/6278/6241/11065/10232/332/6286/3002/9212" "4312/6279/7153/3627/820/983/10232/6362/332/6286/9212/4321/6790" "4312/820/983/10232/6362/332/6286/9212/4321/6790" "4312/6280/6279/7153/4751/890/4085/332/6286/6790" ...
##  $ Count      : int  13 13 10 10 9 7 7 6 6 6 ...
##  $ geneRatio  : num  0.169 0.169 0.13 0.13 0.117 ...
## #...Citation
##   Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
##   R/Bioconductor package for Disease Ontology Semantic and Enrichment
##   analysis. Bioinformatics 2015, 31(4):608-609
16.3 select
select(x, -geneID) %>% head
##                        ID                   Description GeneRatio  BgRatio
## DOID:0060071 DOID:0060071        pre-malignant neoplasm      5/77  22/8007
## DOID:5295       DOID:5295            intestinal disease      9/77 157/8007
## DOID:8719       DOID:8719             in situ carcinoma      4/77  18/8007
## DOID:3007       DOID:3007       breast ductal carcinoma      4/77  29/8007
## DOID:3908       DOID:3908 non-small cell lung carcinoma     13/77 431/8007
## DOID:0050589 DOID:0050589    inflammatory bowel disease      6/77  90/8007
##                    pvalue     p.adjust       qvalue Count
## DOID:0060071 1.671524e-06 0.0006401937 0.0004609887     5
## DOID:5295    1.759049e-05 0.0027885022 0.0020079362     9
## DOID:8719    2.184205e-05 0.0027885022 0.0020079362     4
## DOID:3007    1.564603e-04 0.0136037018 0.0097957122     4
## DOID:3908    2.075001e-04 0.0136037018 0.0097957122    13
## DOID:0050589 2.131128e-04 0.0136037018 0.0097957122     6
16.4 mutate
# k/M
y <- mutate(x, richFactor = Count / as.numeric(sub("/\\d+", "", BgRatio)))
y
## #
## # over-representation test
## #
## #...@organism     Homo sapiens 
## #...@ontology     DO 
## #...@keytype      ENTREZID 
## #...@gene     chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05 
## #...28 enriched terms found
## 'data.frame':    28 obs. of  10 variables:
##  $ ID         : chr  "DOID:0060071" "DOID:5295" "DOID:8719" "DOID:3007" ...
##  $ Description: chr  "pre-malignant neoplasm" "intestinal disease" "in situ carcinoma" "breast ductal carcinoma" ...
##  $ GeneRatio  : chr  "5/77" "9/77" "4/77" "4/77" ...
##  $ BgRatio    : chr  "22/8007" "157/8007" "18/8007" "29/8007" ...
##  $ pvalue     : num  1.67e-06 1.76e-05 2.18e-05 1.56e-04 2.08e-04 ...
##  $ p.adjust   : num  0.00064 0.00279 0.00279 0.0136 0.0136 ...
##  $ qvalue     : num  0.000461 0.002008 0.002008 0.009796 0.009796 ...
##  $ geneID     : chr  "6280/6278/10232/332/4321" "4312/6279/3627/10563/4283/890/366/4902/3620" "6280/6278/10232/332" "6280/6279/4751/6286" ...
##  $ Count      : int  5 9 4 4 13 6 13 5 5 6 ...
##  $ richFactor : num  0.2273 0.0573 0.2222 0.1379 0.0302 ...
## #...Citation
##   Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
##   R/Bioconductor package for Disease Ontology Semantic and Enrichment
##   analysis. Bioinformatics 2015, 31(4):608-609
library(ggplot2)
library(forcats)
library(enrichplot)

ggplot(y, showCategory = 20, 
  aes(richFactor, fct_reorder(Description, richFactor))) + 
  geom_segment(aes(xend=0, yend = Description)) +
  geom_point(aes(color=p.adjust, size = Count)) +
  scale_color_viridis_c(guide=guide_colorbar(reverse=TRUE)) +
  scale_size_continuous(range=c(2, 10)) +
  theme_minimal() + 
  xlab("rich factor") +
  ylab(NULL) + 
  ggtitle("Enriched Disease Ontology")
Visualizing rich factor of enriched terms using lolliplot.
Figure 16.1: Visualizing rich factor of enriched terms using lolliplot.

A very similar concept is Fold Enrichment, which is defined as the ratio of two proportions, (k/n) / (M/N). Using mutate to add the fold enrichment variable is also easy:

mutate(x, FoldEnrichment = parse_ratio(GeneRatio) / parse_ratio(BgRatio))
16.5 slice
We can use slice to choose rows by their ordinal position in the enrichment result. Grouped result use the ordinal position with the group.

In the following example, a GSEA result of Reactome pathway was sorted by the absolute values of NES and the result was grouped by the sign of NES. We then extracted first 5 rows of each groups. The result was displayed in Figure 16.2.

library(ReactomePA)
x <- gsePathway(geneList)


y <- arrange(x, abs(NES)) %>% 
        group_by(sign(NES)) %>% 
        slice(1:5)

library(forcats)
library(ggplot2)
library(ggstance)
library(enrichplot)

ggplot(y, aes(NES, fct_reorder(Description, NES), fill=qvalues), showCategory=10) + 
    geom_col(orientation='y') + 
    scale_fill_continuous(low='red', high='blue', guide=guide_colorbar(reverse=TRUE)) + 
    theme_minimal() + ylab(NULL)
Choose pathways by ordinal positions.
Figure 16.2: Choose pathways by ordinal positions.

16.6 summarise
pbar <- function(x) {
  pi=seq(0, 1, length.out=11)

  mutate(x, pp = cut(p.adjust, pi)) %>%
    group_by(pp) %>% 
    summarise(cnt = n()) %>% 
    ggplot(aes(pp, cnt)) + geom_col() + 
    theme_minimal() +
    xlab("p value intervals") +
    ylab("Frequency") + 
    ggtitle("p value distribution")
}    

x <- enrichDO(de, pvalueCutoff=1, qvalueCutoff=1)
set.seed(2020-09-10)
random_genes <- sample(names(geneList), 100)
y <- enrichDO(random_genes, pvalueCutoff=1, qvalueCutoff=1)
p1 <- pbar(x)
p2 <- pbar(y)
cowplot::plot_grid(p1, p2, ncol=1, labels = LETTERS[1:2])
Distribution of adjusted p-values.
Figure 16.3: Distribution of adjusted p-values.
https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html

5 Overview of enrichment analysis
5.1 Terminology
5.1.1 Gene sets and pathway
A gene set is an unordered collection of genes that are functionally related. A pathway can be interpreted as a gene set by ignoring functional relationships among genes.

5.1.2 Gene Ontology (GO)
Gene Ontology defines concepts/classes used to describe gene function, and relationships between these concepts. It classifies functions along three aspects:

MF: Molecular Function
molecular activities of gene products
CC: Cellular Component
where gene products are active
BP: Biological Process
pathways and larger processes made up of the activities of multiple gene products
GO terms are organized in a directed acyclic graph, where edges between terms represent parent-child relationship.

5.1.3 Kyoto Encyclopedia of Genes and Genomes (KEGG)
KEGG is a collection of manually drawn pathway maps representing molecular interaction and reaction networks. These pathways cover a wide range of biochemical processes that can be divided into 7 broad categories:

Metabolism
Genetic information processing
Environmental information processing
Cellular processes
Organismal systems
Human diseases
Drug development.
5.1.4 Other gene sets
GO and KEGG are the most frequently used for functional analysis. They are typically the first choice because of their long-standing curation and availability for a wide range of species.

Other gene sets include but are not limited to Disease Ontology (DO), Disease Gene Network (DisGeNET), wikiPathways, Molecular Signatures Database (MSigDb).

5.2 Over Representation Analysis
Over Representation Analysis (ORA) (Boyle et al. 2004) is a widely used approach to determine whether known biological functions or processes are over-represented (= enriched) in an experimentally-derived gene list, e.g. a list of differentially expressed genes (DEGs).

The p-value can be calculated by hypergeometric distribution.

p
=
1
−
k
−
1
∑
i
=
0
 
(
M
i
)
(
N
−
M
n
−
i
)
(
N
n
)

In this equation, N is the total number of genes in the background distribution, M is the number of genes within that distribution that are annotated (either directly or indirectly) to the gene set of interest, n is the size of the list of genes of interest and k is the number of genes within that list which are annotated to the gene set. The background distribution by default is all the genes that have annotation. P-values should be adjusted for multiple comparison.

Example: Suppose we have 17,980 genes detected in a Microarray study and 57 genes were differentially expressed. Among the differentially expressed genes, 28 are annotated to a gene set1.

d <- data.frame(gene.not.interest=c(2613, 15310), gene.in.interest=c(28, 29))
row.names(d) <- c("In_category", "not_in_category")
d
##                 gene.not.interest gene.in.interest
## In_category                  2613               28
## not_in_category             15310               29
Whether the overlap(s) of 25 genes are significantly over represented in the gene set can be assessed using a hypergeometric distribution. This corresponds to a one-sided version of Fisher’s exact test.

fisher.test(d, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  d
## p-value = 1
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.110242      Inf
## sample estimates:
## odds ratio 
##  0.1767937
5.3 Gene Set Enrichment Analysis
A common approach to analyzing gene expression profiles is identifying differentially expressed genes that are deemed interesting. The ORA enrichment analysis is based on these differentially expressed genes. This approach will find genes where the difference is large and will fail where the difference is small, but evidenced in coordinated way in a set of related genes. Gene Set Enrichment Analysis (GSEA)(Subramanian et al. 2005) directly addresses this limitation. All genes can be used in GSEA; GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way. This is important since it is likely that many relevant phenotypic differences are manifested by small but consistent changes in a set of genes.

Genes are ranked based on their phenotypes. Given apriori defined set of gene S (e.g., genes sharing the same DO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout the ranked gene list (L) or primarily found at the top or bottom.

There are three key elements of the GSEA method:

Calculation of an Enrichment Score.
The enrichment score (ES) represents the degree to which a set S is over-represented at the top or bottom of the ranked list L. The score is calculated by walking down the list L, increasing a running-sum statistic when we encounter a gene in S and decreasing when it is not encountered. The magnitude of the increment depends on the gene statistics (e.g., correlation of the gene with phenotype). The ES is the maximum deviation from zero encountered in the random walk; it corresponds to a weighted Kolmogorov-Smirnov(KS)-like statistic (Subramanian et al. 2005).
Esimation of Significance Level of ES.
The p-value of the ES is calculated using a permutation test. Specifically, we permute the gene labels of the gene list L and recompute the ES of the gene set for the permutated data, which generate a null distribution for the ES. The p-value of the observed ES is then calculated relative to this null distribution.
Adjustment for Multiple Hypothesis Testing.
When the entire gene sets are evaluated, the estimated significance level is adjusted to account for multiple hypothesis testing and also q-values are calculated for FDR control.
We implemented the GSEA algorithm proposed by Subramanian (Subramanian et al. 2005). Alexey Sergushichev implemented an algorithm for fast GSEA calculation in the fgsea (Korotkevich, Sukhov, and Sergushichev 2019) package. In our packages (clusterProfiler, DOSE, meshes and ReactomePA), users can use the GSEA algorithm implemented in DOSE or fgsea by specifying the parameter by="DOSE" or by="fgsea". By default, the fgsea method will be used since it is much more faster.

5.4 Leading edge analysis and core enriched genes
Leading edge analysis reports Tags to indicate the percentage of genes contributing to the enrichment score, List to indicate where in the list the enrichment score is attained and Signal for enrichment signal strength.

It would also be very interesting to get the core enriched genes that contribute to the enrichment. Our packages (clusterProfiler, DOSE, meshes and ReactomePA) support leading edge analysis and report core enriched genes in GSEA analysis.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

生信小博士

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值