SingleR包通过利用纯细胞类型的参考转录组数据集独立推断每个单细胞的起源细胞,从单细胞RNA测序数据执行无偏细胞类型识别。
1. 载入单细胞数据
# if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#
# BiocManager::install("scRNAseq")
library(scRNAseq) # Collection of Public Single-Cell RNA-Seq Datasets
ls("package:scRNAseq")
hESCs <- LaMannoBrainData('human-es')
# 取前100个单细胞的表达谱做演示
hESCs <- hESCs[,1:100]
assays(hESCs)
assay(hESCs)[1:4,1:4]
#assay(x, i)
#is simply a convenience function which is equivalent to
#assays(x)[[i]]
#assays(se)[[1]][1:5, 1:5]
#assay(se)[1:5, 1:5]
2. 载入细胞类型注释数据集
celldex包提供一组参考表达数据集,这些数据集带有精确的细胞类型标签,用于单细胞数据的自动注释或批量RNA序列的反卷积等过程。
library(celldex)
hpca.se <- HumanPrimaryCellAtlasData()
hpca.se
3. 注释细胞类型
library(SingleR)
ls("package:SingleR")
# library(BiocStyle)
# browseVignettes("BiocStyle")
# SingleR Returns the best annotation for each cell in a test dataset,
# given a labelled reference dataset in the same feature space.
# • scores, a numeric matrix of correlations at the specified quantile for each label (column) in
# each cell (row). This will contain NAs if multiple references were supplied to trainSingleR
# with recompute=TRUE.
# • first.labels, a character vector containing the predicted label before fine-tuning. Only
# added if fine.tune=TRUE.
# • tuned.scores, a DataFrame containing first and second.
# These are numeric vectors containing the best and next-best scores
# at the final round of fine-tuning for each cell. Only added if fine.tune=TRUE.
# • labels, a character vector containing the predicted label based on the maximum entry in
# scores.
# • pruned.labels, a character vector containing the pruned labels where “low-quality”. els are
# replaced with NAs. Only added if prune=TRUE.
## SingleR() expects reference datasets to be normalized and log-transformed.
library(scuttle)
hESCs <- logNormCounts(hESCs)
hpca.se <- logNormCounts(hpca.se)
pred.hesc <- SingleR(test = hESCs, ref = hpca.se, assay.type.test="logcounts",
labels = hpca.se$label.main)
class(pred.hesc) # "DFrame"
dim(pred.hesc) # 行为单细胞名,列为特征,其中label是细胞类型
colnames(pred.hesc)
head(pred.hesc)
table(pred.hesc$labels)
#Create a heatmap of the SingleR assignment scores across all cell-label combinations.
plotScoreHeatmap(pred.hesc)
#Plot the distribution of deltas (i.e., the gap between the assignment score
#for the assigned label and those of the remaining labels) across cells assigned to each reference label.
plotDeltaDistribution(pred.hesc, ncol = 3)
summary(is.na(pred.hesc$pruned.labels))
4. 用一个SingleCellExperiment对象的细胞类型label去注释另一个SingleCellExperiment
library(scRNAseq)
sceM <- MuraroPancreasData()
# One should normally do cell-based quality control at this point, but for
# brevity's sake, we will just remove the unlabelled libraries here.
sceM <- sceM[,!is.na(sceM$label)]
# SingleR() expects reference datasets to be normalized and log-transformed.
# scuttle包:Provides basic utility functions for performing single-cell analyses,
#focusing on simple normalization, quality control and data transformations.
#Also provides some helper functions to assist development of other packages.
library(scuttle)
sceM <- logNormCounts(sceM)
## -----------------------------------------------------------------------------
sceG <- GrunPancreasData()
sceG <- sceG[,colSums(counts(sceG)) > 0] # Remove libraries with no counts.
sceG <- logNormCounts(sceG)
sceG <- sceG[,1:100]
## -----------------------------------------------------------------------------
pred.grun <- SingleR(test=sceG, ref=sceM, labels=sceM$label, de.method="wilcox")
table(pred.grun$labels)
5. 可视化
## -----------------------------------------------------------------------------
plotScoreHeatmap(pred.grun)
## -----------------------------------------------------------------------------
plotDeltaDistribution(pred.grun, ncol = 3)
## -----------------------------------------------------------------------------
summary(is.na(pred.grun$pruned.labels))
## -----------------------------------------------------------------------------
## 不同细胞类型的marker基因
all.markers <- metadata(pred.grun)$de.genes
sceG$labels <- pred.grun$labels
# Beta cell-related markers
library(scater)
plotHeatmap(sceG, order_columns_by="labels",
features=unique(unlist(all.markers$beta)))