单细胞层次给基因集打分除了和GSVA等分析一样可以做富集以外,还可以通过cell marker gene set做亚群注释(需要注意细胞比例的问题,数量少的群体不行,可作辅助)。前面写过的scMetabolism包是给代谢通路打分,思路一样。
AUCell 是ranking-based,所以具体哪种标准化方法不重要,只要不影响每个细胞内基因表达量的排序就好。总的来讲,分为三步:
1,build the rankings
2,calculate the AUC
3,set the assignment threshold
安装R包
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
Biocmanager::install("AUCell")
# To support paralell execution:
BiocManager::install(c("doMC", "doRNG","doSNOW"))
# For the main example:
BiocManager::install(c("mixtools", "GEOquery", "SummarizedExperiment"))
# For the examples in the follow-up section of the tutorial:
BiocManager::install(c("DT", "plotly", "NMF", "d3heatmap", "shiny", "rbokeh",
"dynamicTreeCut","R2HTML","Rtsne", "zoo"))
- load the scRNA expression matrix and gene set
# i.e. Reading from a text file
#exprMatrix <- read.table("myCountsMatrix.tsv")
#exprMatrix <- as.matrix(exprMatrix)
# or using an expression set
#exprMatrix <- exprs(myExpressionSet)
# (This may take a few minutes)
library(GEOquery)
attemptsLeft <- 20
while(attemptsLeft>0)
{
geoFile <- tryCatch(getGEOSuppFiles("GSE60361", makeDirectory=FALSE), error=identity)
if(methods::is(geoFile,"error"))
{
attemptsLeft <- attemptsLeft-1
Sys.sleep(5)
}
else
attemptsLeft <- 0
}
gzFile <- grep(".txt.gz", basename(rownames(geoFile)), fixed=TRUE, value=TRUE)
message(gzFile)
txtFile <- gsub(".gz", "", gzFile, fixed=TRUE)
message(txtFile)
gunzip(filename=gzFile, destname=txtFile, remove=TRUE)
library(data.table)
geoData <- fread(txtFile, sep="\t")
gene