10X单细胞空间联合分析之六(依据每个spot的细胞数量进行单细胞空间联合分析)----Tangram

最新推荐文章于 2024-08-05 14:59:25 发布

追风少年ii

最新推荐文章于 2024-08-05 14:59:25 发布

阅读量621

点赞数 23

文章标签：数据分析 python

本文链接：https://blog.csdn.net/weixin_53637133/article/details/138108657

版权

今天我们来分享一个新的10X单细胞空间联合分析的方法----Tangram，一定要注意这个软件的优势，这里强调一下，根据染色图片推断每个spot拥有细胞核的数量，从而得到每个spot的细胞量，根据这个前提进行10X空间数据的解卷积分析。

我们首先来看文献的内容

Squidpy allows analysis of images in spatial omics analysis workflows

我们首先来掌握一些基础的知识

1、什么是Image Container

The Image Container is an object for microscopy（微观） tissue images associated with spatial molecular datasets（可见Image Container是对图片和数据进行联合处理的这样一个软件）. The object is a thin wrapper of an xarray（xarray软件）.Dataset and provides efficient access to in-memory and on-disk images. On-disk files are loaded lazily using dask through rasterio , meaning content is only read in memory when requested. The object can be saved as a zarr store zarr. This allows handling very large files that do not fit in memory.说白了就是图片处理器。
Image Container is initialised with an in-memory array or a path to an image file on disk. Images are saved with the key layer. If lazy loading is desired, the chunks parameter needs to be specified.

sq.im.ImageContainer ( PATH , layer = < str >, chunks = < int >)

More images layers with the same spatial dimensions x and y like segmentation masks can be added to an existing Image Container.

img.add_img ( PATH , layer_added = < str >)

The Image Container is able to interface with Anndata objects(这个地方大家应该熟悉吧，scanpy处理单细胞数据就是产生这样一个对象), in order to relate any pixel-level information to the observations stored in Anndata. For instance, it is possible to create a generator that yields image’s crops on-the-fly corresponding to locations of the spots in the image:(这个地方也就是说可以直接读取anndata对象中的图片信息)。

spot_generator = img.generate_spot_crops(adata)
lambda x: ( x for x in spot_generator ) # yields crops at spots location

This of course works for both features computed at crop-level but also at segmentation-object level. For instance, it is possible to get centroids coordinates as well as several features of the segmentation object that overlap with the spot capture area.(这个地方了解就可以了)。

第二部分我们来了解一下图片的处理过程

（1）Image processing
Before extracting features from microscopy images, the images can be pre-processed. Squidpy implements functions for commonly used preprocessing functions like conversion to gray-scale or smoothing using a gaussian kernel.

sq.im.process ( img , method =" gray ")##这里的图片就是我们的原始图片

Implementations are based on the Scikit-image package and allow processing of very large images through tiling the image into smaller crops and processing these.（这个地方对图片进行预处理），大家用的时候注意格式问题。

（2）Image segmentation（这个地方可以理解为图片的精细化）
Nuclei segmentation is an important step when analysing microscopy images（重点来了，每个spot的nulei数量的分析，这个跟染色有关）。It allows the quantitative analysis of the number of nuclei, their areas, and morphological features.（量化每个spot的细胞数量，获得区域和形态学的特征）。There are a wide range of approaches for nuclei segmentation, from established techniques like thresholding to modern deep learning-based approaches（这样的分析方法很多？，那我也需要多多学习了）。
A difficulty for nuclei segmentation is to distinguish between partially overlapping nuclei.（overlap的核如何识别，这个是个很重要的问题，尤其癌区域，细胞小而且密集）。Watershed is a classic algorithm used to separate overlapping objects by treating pixel values as local topology.（处理图片的像素作为局部的形态学特征）。For this, starting from points of lowest intensity, the image is flooded until basins from different starting points meet at the watershed ridge lines.（处理的软件及方式，图片处理的知识作者知道的也不多）。

sq.im.segment ( img , method =" watershed ")

其实这个地方和stlearn的图片处理比较相似。
Implementations in Squidpy are based on the original Scikit-image python implementation（图片处理的软件是python模块Scikit-image，有空大家可以深入学习一下）。
（3）Custom approaches with deep learning（数据的深入分割）
Depending on the quality of the data, simple segmentation approaches like watershed might not be appropriate. Nowadays, many complex segmentation algorithms are provided as pre-trained deep learning models, such as Stardist, Splinedist and Cellpose. These models can be easily used within the segmentation function.（这个地方是对数据的分割，注意这里的数据是图片的信息，而不是我们测序的转录组数据）。

sq.im.segment ( img , method = < pre - trained model >)

(4) Image features(图片的特征)。
Tissue organisation in microscopic images can be analysed with different image features.This filters relevant information from the (high dimensional) images, allowing for easy interpretation and comparison with other features obtained at the same spatial location.（不同图片相同空间区域的特征比较）， Image features are calculated from the tissue image at each location (x, y) where there is transcriptomics information available, resulting in a obs x features features matrix similar to the obs x gene matrix.（类似单细胞矩阵）。This image feature matrix can then be used in any single-cell analysis workflow, just like the gene matrix.（看来这部分主要是对测序的数据进行一个下游的分析）。
The scale and size of the image used to calculate features can be adjusted using the scale and spot_scale parameters. Feature extraction can be parallelized by providing n_jobs.The calculated feature matrix is stored in adata[key] .

sq.im.calculate_image_features ( adata , img , features = < list >, spot_scale = < float > ,
scale = < float > , key_added = < str >)

这个地方要注意了，图片和数据开始联合起来进行分析
Summary features calculate the mean, the standard variation or specific quantiles for a color channel.Similarly, histogram features scan the histogram of a color channel to calculate quantiles according a defined number of bins（一些参数的作用）。

sq.im.calculate_image_features ( adata , img , features =" summary ")
sq.im.calculate_image_features ( adata , img , features =" histogram ")

后面也介绍了一些据不数据处理的方法，但是已经不是我们研究的重点了，看看即可。

2、我们来看一下文献的正文部分。

Squidpy implements a pipeline based on Scikit-image for preprocessing and segmenting images, extracting morphological, texture, and deep learning-powered features。

这个地方大家不要太轻视，首先，软件可以处理荧光染色或者H&E染色的图片，前处理和分割都是对图片进行一个处理，最后结合测序数据进行一个特征提取。当然这个地方研究的还不是很深，仍需要修炼。
To enable efficient processing of very large images, this pipeline utilises lazy loading, image tiling and multi-processing(处理过程，前面提到了)。

Features can be extracted from a raw tissue image crop, or Squidpy’s nuclei-segmentation module can be used to extract nuclei counts and nuclei sizes（提取核数量的分析）。

For instance, we can leverage segmented nuclei to inform cell-type deconvolution methods such as Tangram（我们今天的重点） or Cell2Location(这个我之前分享过，文章在10X单细胞和空间联合分析的方法---cell2location,大家对比着看)。

接下来进入我们的重中之重

Cell-type deconvolution using Tangram

Mapping single-cell atlases to spatial transcriptomics data is a crucial analysis steps to integrate cell-type annotation across technologies. Information on the number of nuclei under each spot can help cell-type deconvolution methods. （利用每个spot的核数量来进行10X单细胞空间的联合分析）。
Tangram ([Biancalani et al., 2020], code) is a cell-type deconvolution method that enables mapping of cell-types to single nuclei under each spot. We will show how to leverage the image container segmentation capabilities, together with Tangram, to map cell types of the mouse cortex from sc-RNA-seq data to Visium data.
代码部分我们就不全部重复了，大家根据自己的需求个性化设计。
加载模块,刚才提到的模块都在范围之内。

import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
from anndata import AnnData
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import skimage
# import tangram for spatial deconvolution
import tangram as tg

这里我们以示例数据为准，这个地方大家主要看看数据里面包含的内容
首先是转录组数据：

全部的10X空间转录组数据的处理结果，注意这里是python版本分析结果
其次是图片处理数据：

注意这里的图片信息，如果我们需要分析自己的数据，需要读入自己的高清图片。
最后是单细胞数据
最重要的就是注释的结果。

Nuclei segmentation and segmentation features（每个spot细胞数量的分析）

sq.im.process(img=img, layer="image", method="smooth")
sq.im.segment(
    img=img,
    layer="image_smooth",
    method="watershed",
    channel=0,
)

可视化

inset_y = 1500
inset_x = 1700
inset_sy = 400
inset_sx = 500

fig, axs = plt.subplots(1, 3, figsize=(30, 10))
sc.pl.spatial(
    adata_st, color="cluster", alpha=0.7, frameon=False, show=False, ax=axs[0], title=""
)
axs[0].set_title("Clusters", fontdict={"fontsize": 20})
sf = adata_st.uns["spatial"]["V1_Adult_Mouse_Brain_Coronal_Section_2"]["scalefactors"][
    "tissue_hires_scalef"
]
rect = mpl.patches.Rectangle(
    (inset_y * sf, inset_x * sf),
    width=inset_sx * sf,
    height=inset_sy * sf,
    ec="yellow",
    lw=4,
    fill=False,
)
axs[0].add_patch(rect)

axs[0].axes.xaxis.label.set_visible(False)
axs[0].axes.yaxis.label.set_visible(False)

axs[1].imshow(
    img["image"][inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx, 0] / 65536,
    interpolation="none",
)
axs[1].grid(False)
axs[1].set_xticks([])
axs[1].set_yticks([])
axs[1].set_title("DAPI", fontdict={"fontsize": 20})

crop = img["segmented_watershed"][
    inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx
].values
crop = skimage.segmentation.relabel_sequential(crop)[0]
cmap = plt.cm.plasma
cmap.set_under(color="black")
axs[2].imshow(crop, interpolation="none", cmap=cmap, vmin=0.001)
axs[2].grid(False)
axs[2].set_xticks([])
axs[2].set_yticks([])
axs[2].set_title("Nucleous segmentation", fontdict={"fontsize": 20})

不知道大家python画图的能力怎么样

We then need to extract some image features useful for the deconvolution task downstream. Specifically, we will need: - the number of unique segmentation objects (i.e. nuclei) under each spot. - the coordinates of the centroids of the segmentation object.（分析每个spot里面的细胞数量）。

# define image layer to use for segmentation
features_kwargs = {
    "segmentation": {
        "label_layer": "segmented_watershed",
        "props": ["label", "centroid"],
        "channels": [1, 2],
    }
}
# calculate segmentation features
sq.im.calculate_image_features(
    adata_st,
    img,
    layer="image",
    key_added="image_features",
    features_kwargs=features_kwargs,
    features="segmentation",
    mask_circle=True,
)

adata_st.obs["cell_count"] = adata_st.obsm["image_features"]["segmentation_label"]
sc.pl.spatial(adata_st, color=["cluster", "cell_count"], frameon=False)

从而得到每个spot的细胞数量，进行精细化的NMF分析。

Deconvolution and mapping

At this stage, we have all we need for the deconvolution task. First, we need to find a set of common genes the single cell and spatial datasets. We will use the intersection of the highly variable genes.（提取联合分析的基因）
这个地方根据自己的需求进行分析

sc.tl.rank_genes_groups(adata_sc, groupby="cell_subclass")
markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
genes_sc = np.unique(markers_df.melt().value.values)
genes_st = adata_st.var_names.values
genes = list(set(genes_sc).intersection(set(genes_st)))

开始进行解卷积的分析

mapper = tg.mapping_optimizer.MapperConstrained(
    S=S,
    G=G,
    d=d,
    device=device,
    **hyperparm,
    target_count=adata_st.obs.cell_count.sum()
)

我们来看一下分析的结果