10X空间转录组速率分析（Velocyto）之SIRV

最新推荐文章于 2025-04-13 16:51:40 发布

追风少年ii

最新推荐文章于 2025-04-13 16:51:40 发布

阅读量1k

点赞数 19

文章标签：前端 python

本文链接：https://blog.csdn.net/weixin_53637133/article/details/138336546

版权

hello，大家好，今天我们来分享Velocyto如何在空间上的运用，10X单细胞数据做RNA Velocyto大家应该都已经不陌生了吧，相信很多人都做过，大家也应该很了解RNA Velocyto了，那么今天我们分享一个内容，就是RNA Velocyto如何在空间数据上的运用，发育不仅仅是细胞的转变，更重要的是，空间位置上的变化，我们多么希望在空间转录组上直接同时体现细胞的发育进程和位置变化。文章在SIRV: Spatial inference of RNA velocity at the single-cell resolution，大家可以看一下。

我们简单分享一下文献，看看原理，这里我们就分享重点

Abstract（这一段大家看看就行了）

Studying cellular differentiation using single-cell RNA sequencing (scRNA-seq) rapidly expands our understanding of cellular development processes. Recently, RNA velocity has created new possibilities in studying these cellular differentiation processes, as differentiation dynamics can be obtained from measured spliced and unspliced mRNA expression. However, to `map these differentiation processes to developments within a tissue, the spatial context of the tissue should be taken into account, which is not possible with current approaches as they start from dissociated cells`. We present SIRV (Spatially Inferred RNA Velocity), `a method to infer spatial differentiation trajectories within the spatial context of a tissue at the single-cell resolution`. SIRV `integrates spatial transcriptomics data with reference scRNA-seq data, to enrich the spatially measured genes with spliced and unspliced expressions from the scRNA-seq data`. (看来也是单细胞空间数据的联合分析)。Next, SIRV `calculates RNA velocity vectors for every spatially measured cell and maps these vectors to the spatial coordinates within the tissue`. We tested SIRV on the Developing Mouse Brain Atlas data and obtained biologically relevant spatial differentiation trajectories. Additionally, SIRV annotates spatial cells with cellular identities and the region of origin which are transferred from the annotated reference scRNA-seq data. Altogether, with SIRV, we introduce a new tool to enrich spatial transcriptomics data that can assist in understanding how tissues develop.

Introduction

(1)Current protocols(空间转录组) can be divided into two main categories:

sequencing-based technologies that detect and quantify the mRNA in situ, such as 10X Genomics Visium, Slide-seq and ST
imaging-based technologies using fluorescence in situ hybridization (FISH), such as smFISH, MERFISH and seqFISH

In principle, it is possible to apply RNA velocity analysis to spatial transcriptomics measured using sequencing-based protocols, as the spliced and unspliced expression ratios can be directly obtained from the sequencing data.(这也是最初的想法，但是经不起推敲，结果不能令人信服)。

第一步，SIRV integrates spatial transcriptomics and scRNA-seq data in order to predict the spliced and unspliced expression of the spatially measured genes（借助对应的单细胞数据进行空间的注释和Velocyto，这个想法一致都有，看看如何实现）。

第二步，Next it calculates RNA velocity vectors for each cell that are then projected onto the two-dimensional spatial coordinates, which are then used to derive flow fields by averaging dynamics of spatially neighboring cells。（这个大家应该都知道）

第三步，SIRV transfers various label annotations of the scRNA-seq to the spatial transcriptomics data, allowing us to richly annotate the spatial data（利用单细胞数据对空间数据进行注释）。

第四步，produced biologically relevant spatial differentiation trajectories（时间和空间的轨迹信息我们就全部获得了，很赞）.

方法：

输入需求，the spatial transcriptomics data represented by a gene expression matrix, and the scRNA-seq data having three expression matrices corresponding to the spliced (mature mRNA), unspliced (immature mRNA) and full mRNA expression（单细胞数据需要metadata，包括细胞注释，组织来源等），un/spliced expressions are then used to calculate the RNA velocity of each gene for each cell.最后可变信息的内容和空间对应，起到轨迹上空间上和时间上的相互配合，信息最为全面。

SIRV包括4部分：

（1）integration of the spatial transcriptomics and scRNA-seq datasets（单细胞空间的联合分析）。
（2）predictions of un/spliced（这主要是单细胞的数据）.
(3) label/metadata transfer (optional) 这个最好做一下，不注释的结果都是耍流氓。
*（4）estimation of RNA velocities within the spatial context.（最想知道的结果）.

首先第一步的联合分析，

The spatial transcriptomics and scRNA-seq dataset are integrated by finding the common signal between the two datasets.（这个大家最为熟悉的联合分析也可以用，比如Seurat，SPOTlight等）。但是这里的联合分析，作者的方法与普通的联合方法有差别，大家可以深入了解一下。Building on `SpaGE`, the `integration step is performed using PRECISE to define a common latent space`. In brief, using the set of `shared genes` across the two datasets, we calculate a separate Principal Component Analysis (PCA) for each dataset, and then `aligns these separate principal components`, resulting in `principal vectors` (PVs). `These PVs have a one-to-one correspondence between the two datasets`, and the highly correlated PV-pairs represent the common signal. Finally, `both the spatial transcriptomics and scRNA-seq datasets are projected onto the PVs of the reference dataset` (scRNA-seq in this case), producing an integrated and aligned version of both datasets.这里作者也提到，the spliced and unspliced expressions are only used in the prediction (following) step.（也就是说，单细胞的可变剪切的结果，投射到空间转录组上面）。

第二步，Un/spliced expression prediction。

After obtaining the aligned datasets, SIRV enriches the spatially measured genes with spliced and unspliced expression predicted from the scRNA-seq dataset.（果然），Such prediction is performed using a `kNN regression`(这个大家应该都不陌生吧，不知道的拉出去枪毙5分钟),For each spatial cell 𝑖(这里就是SPOT), we calculate the k-nearest-neighbors from the (aligned) scRNA-seq dataset and assign a weight to each neighbor inversely proportional to its distance.

with 𝑤𝑖j representing the weight between each spatial cell 𝑖 and its 𝑗-th nearest neighbor, 𝑑ist(𝑖,𝑗) being the cosine distance between spatial cell 𝑖 and scRNA-seq cell 𝑗 ∈𝑁𝑁(𝑖), and 𝑘 equaling the number of nearest-neighbors used.

For every spatially measured gene 𝑔, the spliced (𝑆𝑖g) and unspliced (𝑈𝑖g) expression are predicted by:(看来懂一点数学知识还是很有必要的)。

with 𝑆R𝑗g and 𝑈R 𝑗g representing the spliced and unspliced expression of gene 𝑔 from the scRNA-seq dataset, respectively.(理解起来还是有点费劲啊😄)。

第三步，Label (metadata) transfer，这里我们可以直接用Seurat或者SPOTlight的方法进行注释。

SIRV can annotate the spatial transcriptomics dataset with any relevant labels from the scRNA-seq dataset using the same kNN regression scheme as introduced earlier。Taking the cell identity annotation as an example（看看注释的过程，大家了解了解）

第四步，RNA velocity analysis

first, we calculated the high-dimensional RNA velocity vectors for the spatially measured genes (set of genes originally measured in the spatial dataset), next we projected and visualized these vectors on the spatial coordinates of the cells in order to define directions of cellular differentiation in the spatial context.（总之就是联合）。

当然，单细胞空间数据都进行了基本的分析，这个大家都很了解了。

我们来看看结果，

（1）SIRV overview

（2）SIRV produces interesting spatial differentiation trajectories in the developing mouse brain

（3）SIRV correctly transferred label annotation verified by spatial organization

（4）RNA velocities interpretation based on transferred cell labels

至于代码，大家看看就好，多运用起来,就不一一进行展示了

"""
Created on Sun May 30 19:40:39 2021

@author: trmabdelaal
"""

import scvelo as scv
import scanpy as sc
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.metrics.cluster import contingency_matrix
import sys
sys.path.insert(1,'SIRV/')
from main import SIRV
import warnings
warnings.filterwarnings('ignore')

# load preprocessed scRNA-seq and spatial datasets
RNA = scv.read('SIRV_data/RNA_adata.h5ad')
HybISS = scv.read('SIRV_data/HybISS_adata.h5ad')

# Apply SIRV to integrate both datasets and predict the un/spliced expressions
# for the spatially measured genes, additionally transfer 'Region', 'Class' and
# 'Subclass' label annotations from scRNA-seq to spatial data
HybISS_imputed = SIRV(HybISS,RNA,50,['Tissue','Region','Class','Subclass'])

# Normalize the imputed un/spliced expressions, this will also re-normalize the
# full spatial mRNA 'X', this needs to be undone 
scv.pp.normalize_per_cell(HybISS_imputed, enforce=True)

# Undo the double normalization of the full mRNA 'X'
HybISS_imputed.X = HybISS.to_df()[HybISS_imputed.var_names]

# Zero mean and unit variance scaling, PCA, building neibourhood graph, running
# umap and cluster the HybISS spatial data using Leiden clustering
sc.pp.scale(HybISS_imputed)
sc.tl.pca(HybISS_imputed)
sc.pl.pca_variance_ratio(HybISS_imputed, n_pcs=50, log=True)
sc.pp.neighbors(HybISS_imputed, n_neighbors=30, n_pcs=30)
sc.tl.umap(HybISS_imputed)
sc.tl.leiden(HybISS_imputed)
# Fig. 2A
sc.pl.umap(HybISS_imputed, color='leiden')
# Fig. 2B
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='leiden')

# Calculating RNA velocities and projecting them on the UMAP embedding and spatial
# coordinates of the tissue
scv.pp.moments(HybISS_imputed, n_pcs=30, n_neighbors=30)
scv.tl.velocity(HybISS_imputed)
scv.tl.velocity_graph(HybISS_imputed)
# Fig. 2C
scv.pl.velocity_embedding_stream(HybISS_imputed, basis='umap', color='leiden')
# Fig. 2D
scv.pl.velocity_embedding_stream(HybISS_imputed, basis='xy_loc', color='leiden',size=60,legend_fontsize=4,legend_loc='right')

# Cell-level RNA velocities 
# Fig. 3
scv.pl.velocity_embedding(HybISS_imputed,basis='xy_loc', color='leiden')

# Visualizing transferred label annotations on UMAP embedding and spatial coordinates
# Fig. 4A
sc.pl.umap(HybISS_imputed, color='Region')
# Fig. 4B
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='Region')
# Fig. 4C
sc.pl.umap(HybISS_imputed, color='Subclass')
# Fig. 4D
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='Subclass')
# Supplementary Fig. S3A
sc.pl.umap(HybISS_imputed, color='Class')
# Supplementary Fig. S3B
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='Class')

# Intepretation of RNA velocities using transferred label annotations
# Fig. 5
scv.pl.velocity_embedding(HybISS_imputed,basis='xy_loc', color='Subclass')
# Supplementary Fig. S3C
scv.pl.velocity_embedding(HybISS_imputed,basis='xy_loc', color='Class')

# Comparing cell clusters with transferred 'Subclass' and 'Class' annotations
def Norm(x):
    return (x/np.sum(x))

# Subclass annotation
cont_mat = contingency_matrix(HybISS_imputed.obs.leiden.astype(np.int_),HybISS_imputed.obs.Subclass)
df_cont_mat = pd.DataFrame(cont_mat,index = np.unique(HybISS_imputed.obs.leiden.astype(np.int_)), 
                           columns=np.unique(HybISS_imputed.obs.Subclass))

df_cont_mat = df_cont_mat.apply(Norm,axis=1)
# Supplementary Fig. S5A
plt.figure()
sns.heatmap(df_cont_mat,annot=True,fmt='.2f')
plt.yticks(np.arange(df_cont_mat.shape[0])+0.5,df_cont_mat.index)
plt.xticks(np.arange(df_cont_mat.shape[1])+0.5,df_cont_mat.columns)

# Class annotation
cont_mat = contingency_matrix(HybISS_imputed.obs.leiden.astype(np.int_),HybISS_imputed.obs.Class)
df_cont_mat = pd.DataFrame(cont_mat,index = np.unique(HybISS_imputed.obs.leiden.astype(np.int_)), 
                           columns=np.unique(HybISS_imputed.obs.Class))

df_cont_mat = df_cont_mat.apply(Norm,axis=1)
# Supplementary Fig. S5B
plt.figure()
sns.heatmap(df_cont_mat,annot=True,fmt='.2f')
plt.yticks(np.arange(df_cont_mat.shape[0])+0.5,df_cont_mat.index)
plt.xticks(np.arange(df_cont_mat.shape[1])+0.5,df_cont_mat.columns)