10X空间转录组去污染分析之SpotClean

最新推荐文章于 2025-05-17 18:19:21 发布

最新推荐文章于 2025-05-17 18:19:21 发布

文章标签：前端空间转录组单细胞排污

注： Data from the human dorsolateral prefrontal cortex profiled in the spatialLIBD experiment,sample LIBD_151507. (a) UMI count densities for tissue and background spots show relatively high counts in the background. (b) UMI total counts in the background decrease with increasing distance from the tissue;the perimeter delineating tissue and background is shown in white. (c) Counts of the top 50 genes from a select tissue region (upper), from a nearby background region (middle), and from a distant background region (bottom) show the similarity between expression in tissue spots and nearby background spots due to spot swapping from tissue to background, an effect that decreases as distance from the tissue increases. (d) Tissue and background spots are not distinguished visually via UMAP. (e) Graph-based clustering of all spots identifies 9 clusters. (f) Spots on the slide are colored by their cluster membership shown in (e). Black arrows highlight areas of spot swapping of signal from tissue to background. Spots on the perimeter (shown in white) have been removed from the summaries shown here to ensure that the effects shown are not due to spots on the tissue-background boundary.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SpotClean")
library(SpotClean)

# Not run

# Load example data
data(mbrain_raw)
data(mbrain_slide_info)

# Visualize raw data
mbrain_obj <- CreateSlide(count_mat = mbrain_raw, 
                          slide_info = mbrain_slide_info)
VisualizeSlide(slide_obj = mbrain_obj)
VisualizeHeatmap(mbrain_obj,rownames(mbrain_raw)[1])

# Decontaminate raw data
decont_obj <- SpotClean(mbrain_obj)

# Visualize decontaminated gene
VisualizeHeatmap(decont_obj,rownames(mbrain_raw)[1])

# Visualize the estimated per-spot contamination rate
VisualizeHeatmap(decont_obj,decont_obj@metadata$contamination_rate, 
                 logged = FALSE, legend_title = "contamination rate",
                 legend_range = c(0,1))

# (Optionally) Transform to Seurat object for downstream analyses
seurat_obj <- ConvertToSeurat(decont_obj,image_dir = "/path/to/spatial/folder")

$组织斑点过多（背景斑点不够)：虽然观察到的数据是具有固定列数（spot）的单个矩阵，但未知参数的数量与组织spot的数量成正比。在所有spot都被组织覆盖的极端情况下，有比观察到的数据值更多的未知参数。在这种情况下，受污染的表达式与真实表达式混淆，SpotClean 估计变得不可靠。因此建议输入数据至少有 25% 的斑点未被组织占据，以便 SpotClean 从背景斑点中获得足够的信息来估计污染。
Lowly-expressed genes： Lowly-expressed genes typically contain relatively less information and relatively more noise than highly-expressed genes. SpotClean by default only keeps highly-expressed and highly-variable genes for decontamination. It can be forced to run on manually-specified lowly-expressed genes. However, even in this case, expression for the lowly-expressed genes is typically not changed very much. Given the high sparsity in most lowly expressed genes, there is not enough information available to confidently reassign UMIs in most cases. However, we do not filter genes by sparsity because there can be interesting genes highly concentrated in a small tissue region. In cases like this, SpotClean is effective at adjusting for spot swapping in these regions. If the defaults are not appropriate, users can either adjust the expression cutoffs or manually specify genes to decontaminate.
Inference based on sequencing depth： SpotClean reassigns bled-out UMIs to their tissue spots of origin which changes the estimated sequencing depth of tissue spots after decontamination, since most estimations of sequencing depth rely on total expressions at every spot. As a result, decontamination can be considered as another type of normalization and might conflict with existing sequencing depth normalization methods.

# Not run

raw_mat <- Read10xRaw(count_dir = "/path/to/raw_feature_bc_matrix/")
slide_info <- Read10xSlide(tissue_csv_file="/path/to/tissue_positions_list.csv", 
                           tissue_img_file="/path/to/tissue_lowres_image.png",
                           scale_factor_file="/path/to/scalefactors_json.json")

# Compare with bundled example data
data(mbrain_raw)
data(mbrain_slide_info)
slide_info$slide$total_counts <- colSums(
    raw_mat[rownames(mbrain_raw),mbrain_slide_info$slide$barcode]
)

identical(raw_mat[rownames(mbrain_raw),], mbrain_raw)
identical(slide_info$slide, mbrain_slide_info$slide)

data(mbrain_raw)
data(mbrain_slide_info)

slide_obj <- CreateSlide(mbrain_raw, mbrain_slide_info)
slide_obj

VisualizeSlide(slide_obj)

VisualizeLabel(slide_obj,"tissue")

VisualizeHeatmap(slide_obj,"total_counts")

VisualizeHeatmap(slide_obj,"Mbp")

decont_obj <- SpotClean(slide_obj, maxit=10, candidate_radius = 20)

VisualizeHeatmap(decont_obj,"Mbp")

summary(decont_obj@metadata$contamination_rate)

ARCScore(slide_obj)
## [1] 0.05160659

seurat_obj <- ConvertToSeurat(decont_obj,image_dir = "/path/to/spatial/folder")

生活很好，有你更好

10X空间转录组去污染分析之SpotClean

好了，我们来看看10X空间转录组的污染以及如何去除。

summary

简介

下面又是一些证明的例子

上面的结果表明从组织到背景发生点交换，但评估从组织spot到组织spot的点交换程度更具挑战性。虽然 SpotClean 模型提供了一个估计值（下表），

这种令人讨厌的变异性降低了下游分析的能力和精度（下图）。

这些结果表明 SpotClean 提供了更好的表达估计；和下图表明，SpotClean 表达估计提高了识别空间变化基因的精度。

上图b 和下图考虑了已知在原始和 SpotClean 净化数据中 WM 和 Layer6 之间差异表达（DE）的基因；

空间转录组学为解决生物学问题和加强患者护理提供了前所未有的机会，但必须调整点交换引起的伪影，以确保从这些强大的实验中获得最大的信息。 SpotClean 提供更准确的表达估计，从而提高下游分析的能力和精度。

最后看看示例代码

安装和加载

Short Demo

在这里，在捆绑的示例数据上快速演示了一般的 SpotClean 工作流程。 然后在下面进行分步说明。

Running Speed

Situations you should think twice about before applying SpotClean

Recommended applications

鉴于cluster主要由相对较少的高表达基因决定，SpotClean 不会在大多数数据集中显着改变cluster。 虽然cluster的定义可能会稍微好一些，但在大多数情况下，在应用 SpotClean 后，看不到cluster数量和/或cluster之间关系的差异。

step-to-step analyse

加载10X空间数据

Create the slide object

Visualize the slide object

You can also provide a certain gene name appearing in the raw count matrix in input slide object to VisualizeHeatmap(). For example, the expression of the Mbp gene can be visualized:

Decontaminate the data

meta数据现在包含更多信息，包括来自 SpotClean 模型的参数估计和污染水平的测量。

We can visualize the Mbp gene expressions after 10 iterations of decontamination:

Estimate contamination levels in observed data

Our model is able to estimate the proportion of contaminated expression at each tissue spot (i.e. expression at a tissue spot that orginated from a different spot due to spot swapping):

This indicates around 30% of UMIs at a tissue spot in the observed data came from spot swapping contamination, averaging across all tissue spots.

ARC score

This indicates at least 5% expressions in observed data came from spot swapping contamination.

Convert to Seurat object for downstream analyses

ConvertToSeurat() can be used to convert our slide object to a Seurat spatial object. Note that Seurat requires full input of the spatial folder. In the above example, this is the spatial folder.

在这里，在捆绑的示例数据上快速演示了一般的 SpotClean 工作流程。然后在下面进行分步说明。

鉴于cluster主要由相对较少的高表达基因决定，SpotClean 不会在大多数数据集中显着改变cluster。虽然cluster的定义可能会稍微好一些，但在大多数情况下，在应用 SpotClean 后，看不到cluster数量和/或cluster之间关系的差异。