10X空间转录组数据分析之功能信号网络(Ligand---receptor----downstream genes network)

作者,追风少年i
hello,大家好,周二了,几天就要过完上半年了,不知道大家感觉如何了??人生总是有很多磨难,想要的东西总是让我们得不到,所以我们会有时很羡慕别人,可能一辈子奋斗想要的东西,别人唾手可得~~~~😄,可能越长大,越要承认自己的平凡,越是经历,越要珍惜所拥有的。
今天我们要继续空间转录组的分析内容,还是主要研究细胞在空间位置上的相互作用,参考文章在Decoding functional cell–cell communication events by multi-view graph learning on spatial transcriptomics,其实就是要利用空间转录组的信息推断空间区域的配受体--靶基因网络,文章是清华大学的才子所发,希望这位才子最后没进编制~~~😄,继续为科研而努力。
由多个配体-受体对介导的细胞间通讯事件(CEs)构建了一个复杂的细胞间信号网络。 通常只有一部分 CE 直接作用于特定微环境中的特定下游响应。 将它们称为功能通信事件 (FCE)。

图片.png

空间转录组学方法可以描绘配体、受体及其下游基因的基因表达水平的空间分布。 这为揭示细胞间通信的全景提供了新的可能性。 这里开发了一种计算方法 HoloNet,用于使用空间转录组数据解码 FCE。 将 CE 建模为多视图网络,在网络上开发了一个attention-based的图学习模型来预测目标基因的表达,并通过解释训练后的模型来解码特定下游基因的 FCE。

先来看看技术背景

细胞间通讯对于多细胞生物的组织和维持以及疾病的发生至关重要。 当Sender细胞向微环境释放配体分子,而受体细胞通过受体感知配体时,就会发生细胞间通讯事件 (CE)。 CEs 形成一个复杂的细胞间信号网络,参与各种基因的调控。 异常的CEs可以触发异常的细胞表型,并可能导致许多疾病的发生。

CE 的全部应包括Sender细胞、Receiver细胞、配体-受体对及其下游响应。 值得注意的是,特定的下游响应通常仅由部分推断的 CE 引起。 将此类 CE 称为其下游目标的功能通信事件 (FCE)。 解码 FCE 有助于了解特定微环境如何塑造细胞表型,因此有助于研究可能的疾病干预措施。

当前的分子谱分析技术提供bulk或单细胞基因表达谱,包括主要配体和受体基因、细胞类型特异性标记基因和可能的下游反应基因。最近的空间分子分析技术,如空间转录组测序和 SeqFISH,可以提供有关细胞位置的更多信息。已经开发了一些数据驱动的方法来研究基于这些技术的 CE。例如,NicheNet 和 MESSI 旨在表征配体-受体对与下游反应之间的关系,而 NCEM 则专注于Sender-Receiver细胞依赖性以及相关表型。对 FCE 的全面了解应该包含对以下问题的答案:哪些细胞类型充当发送者和接收者,它们在哪里,以及它们采用哪些配体-受体对来诱导什么样的下游反应。然而,仍然缺乏构建这种 FCE 全息网络的方法。

看一看构建方法
Overview of HoloNet

HoloNet 是一种使用空间转录组数据解码 FCE 的计算方法。 它需要空间数据、细胞类型标签和配对的配体-受体基因列表作为输入。 所需的空间数据应包含细胞的高维基因表达谱和空间位置。 用户可以为基于spot的数据集提供从反卷积方法(例如 Seurat)派生的连续细胞类型百分比,或为具有单细胞分辨率的数据集提供分类细胞类型标签。 使用来自 connectomeDB2020 的配对配体-受体基因列表作为默认设置,但也可以提供来自其他数据库的基因列表

HoloNet 首先使用空间数据和配对的配体-受体基因列表构建了一个多视图 CE 网络。在 CE 中,配体分子由Sender细胞释放,在组织内扩散一段距离,然后与受体细胞表达的受体结合当细胞通过多个配体-受体对与其他细胞进行交流时,CE 可以自然地建模为多视图网络。该网络中的每个视图都是一个有向图,表示由一个配体-受体对介导的 CE,具有自己的加权edge。每个节点代表被测组织中的一个细胞或一个点,并且多个视图中的节点是对齐的。任何视图中的有向边都将Sender Cell连接到其相应的Receiver cell。边缘权重表示由发送者和接收者细胞之间的配体-受体对介导的 CE 强度。对于每个视图,HoloNet 通过三个因素的函数计算每个边的权重:Sender 细胞中配体的表达、Receiver Cell中受体的表达以及Sender 细胞和Receiver Cell的物理距离。 HoloNet 仅保留通过置换测试评估的高置信度edge。

HoloNet 提供了多种方法来分析和可视化多视图 CE 网络。 对于任何配体-受体对,HoloNet 在相应的视图中执行网络中心性分析,以将具有高通信激活的细胞(或spot)检测为 CE 热点。 HoloNet 还为每个视图构建了一个细胞类型级别的 CE 网络,它代表了从一种细胞类型到另一种细胞类型的一般 CE 强度。 为了探索不同配体-受体对之间的关系,HoloNet 根据其局部特征量化不同网络视图之间的相似性,并执行层次聚类方法来聚类配体-受体对。

HoloNet 可以在构建的多视图 CE 网络的基础上揭示 FCE 的全息网络。如果 CE 改变了接收细胞中感兴趣基因的表达,将 CE 视为 FCE。为了解码 FCE,将所有单个细胞或spot中靶基因的表达 E 分解为其基线表达 E0,由其细胞类型和由多种细胞类型触发并由多个配体-受体对介导的 CE 引起的表达变化 ΔE 决定. 采用多视图图神经网络 (GNN) 来预测目标基因的表达谱,并解释训练后的模型以解码 FCE。下图d说明了该过程。首先通过 one-hot 编码或每个spot的细胞类型百分比获得了一个矩阵,该矩阵表示每个细胞的细胞类型。然后,采用该视图的CE网络作为邻接矩阵,将细胞类型矩阵的每一列作为特征分配给每个图的对应节点,为每个视图构建一个GNN。每个视图的节点嵌入与注意力机制相结合。综合结果进一步输入全连接层以估计 ΔE。另一方面,构建另一个全连接层以从细胞类型矩阵估计 E0。整个模型被训练以最小化 E 与 E0 和 ΔE 之和之间的差异。

训练后,解释 HoloNet 的attention weights,以指示多视图网络中的哪些视图对目标基因的表达贡献更大。 与这些views相对应的配体-受体对被认为是调节靶基因表达的 FCE 的核心介质。 所有训练模型中的所有注意力权重及其平均值都可视化为 FCE 中介图。 可以基于图卷积层进一步识别 FCE 中主要的发送方和接收方细胞类型,以构建细胞类型级的 FCE 连接网络。 为了确保研究结果的可靠性,对每个目标基因重复了训练过程,并整合了所有模型预测的基因表达和模型解释结果。

The overall workflow of HoloNet


图注:a Definition of CEs and FCEs. b Constructing multi-view CE networks using spatial transcriptomic data. c As CE visualization, HoloNet provides CE hotspots and cell-type-level CE networks. d Predicting specific gene expressions via multi-view graph learning on the constructed multi-view CE network. e As FCE visualization, HoloNet identifies the core mediators and core senders of FCEs for specific genes.

看一看分析示例

Characterizing the cell–cell communication landscape of a breast cancer Visium dataset by HoloNet

图注:a The cell type with the highest percentage in each spot. Each dot printed on the slide represents a single invisible capture spot with a diameter of 55 μm. b The percentages of basal-like-1 and stromal cells in each spot. c The constructed single-view CE network based on the COL1A1:DDR1 pair. The size of a node indicates the degree centralities of the node in the network, and the color indicates the sender-receiver attribute of the node. The sender-receiver attribute is the result of subtracting the in-degree of a node from its out-degree. The redder the color of nodes, the more inclined the nodes are to act as a sender, and the bluer on the contrary. The thickness of an edge indicates the strength of CEs between cells linked by it. Nodes with too small size and edges with too low thickness are not displayed. Nodes are mapped to the spatial positions of their corresponding spots in the hematoxylin and eosin (H&E) stained histology image. d CE hotspot plot describes regions with active COL1A1:DDR1 signals. For (C), (D) and (E), the protruding region between parenchyma and stroma is highlighted. e Cell-type-level CE network plot describes the COL1A1:DDR1 communication landscape between cell types. The thickness of an edge reflects the summed weight of all edges in the CE network between the pair of cell types. f Low-dimensional representation of the eigenvector centrality vectors of each view plotted by UMAP. All ligand–receptors were hierarchically clustered into four groups based on the eigenvector centrality vectors.

看看代码示例
安装很简单
pip install HoloNet
分析以10X数据为例
import HoloNet as hn

import os
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import torch

import warnings
warnings.filterwarnings('ignore')
hn.set_figure_params(tex_fonts=False)
sc.settings.figdir = './figures/'
加载数据

Loading example spatial transcriptomic data

adata = hn.pp.load_brca_visium_10x()
####Visualize the cell-type percentages in each spot
hn.pl.plot_cell_type_proportion(adata, plot_cell_type='stroma')

The parameters of plotting functions in this tutorials are mainly inherited from two base plotting functions:

###The cell-type label of each spot (the cell-type with maximum percentage in the spot)
sc.pl.spatial(adata, color=['cell_type'], size=1.4, alpha=0.7,
             palette=hn.brca_default_color_celltype)

Loading ligand-receptor database
###We provide a database with pairwise ligand and receptor genes for users. Load the database and filter the LR pairs, requiring both ligand and receptor genes to be expressed in a certain percentage of cells (or spots).
LR_df = hn.pp.load_lr_df()
expressed_LR_df = hn.pp.get_expressed_lr_df(LR_df, adata, expressed_proportion=0.3)
expressed_LR_df.head(3)

Constructing multi-view CE network

Ligand molecules from a single source can only cover a certain region.
Before constructing multi-view communication network, we need to calculate the w_best to decide the region (‘how far is far’).
The parameters for culcalating w_best is shown in HoloNet.tools.default_w_visium()

w_best = hn.tl.default_w_visium(adata)
hn.pl.select_w(adata, w_best=w_best)

Based on w_best, we can build up the multi-view communication network.

We calculate the edge weights of the multi-view communication network. With the more attributions of ligands, HoloNet.tools.compute_ce_tensor() can set different w_best for secreted ligands and plasma-membrane-binding ligands.

Then we filter the edges with low specificities.

CE_tensor = hn.tl.compute_ce_tensor(adata, lr_df=expressed_LR_df, w_best=w_best)
CE_tensor_filtered = hn.tl.filter_ce_tensor(CE_tensor, adata,
                                            lr_df=expressed_LR_df, w_best=w_best)
Visualizing CEs

Based on the multi-view CE network, we provide two visualization methods:

CE hotspot plots for visualizing the centralities of spots. Provide two calculating methods:

  • Degree centrality: out-degree + in-degree, faithfully reflects the CE strength related to each spot.
  • Eigenvector centrality: reflects the core regions with active communication.

Cell-type-level CE network:

  • CE strengths among cell-types
Cell-type-level CE network
###Loading the cell-type percentage of each spot.
cell_type_mat, \
cell_type_names = hn.pr.get_continuous_cell_type_tensor(adata, continuous_cell_type_slot = 'predicted_cell_type',)
####Plotting the cell-type-level CE network. The thickness of the edge represents the strength of COL1A1:DDR1 communication between the two cell types.
_ = hn.pl.ce_cell_type_network_plot(CE_tensor_filtered, cell_type_mat, cell_type_names,
                                    lr_df=expressed_LR_df, plot_lr='COL1A1:DDR1', edge_thres=0.2,
                                    palette=hn.brca_default_color_celltype)

LR pair clustering

Agglomerative Clustering the ligand-receptor pairs based on the centrality of each spot. The cluster label of each ligand-receptor pair saved in clustered_expressed_LR_df['cluster']. The number of clusters can be selected using n_clusters parameter in HoloNet.tools.cluster_lr_based_on_ce()

cell_cci_centrality = hn.tl.compute_ce_network_eigenvector_centrality(CE_tensor_filtered)
clustered_expressed_LR_df = hn.tl.cluster_lr_based_on_ce(CE_tensor_filtered, adata, expressed_LR_df,
                                                         w_best=w_best, cell_cci_centrality=cell_cci_centrality)

Visualize the ligand-receptor pair clusters in a UMAP plot. Each ligand-receptor pair has a centrality vector, containing the centralities of spots in the view of CE network The UMAP plot is a low dimentional representation of the centrality vectors.

hn.pl.lr_umap(clustered_expressed_LR_df, cell_cci_centrality, plot_lr_list=['COL1A1:DDR1'], linewidths=0.7)

General CE hotspot of each ligand-receptor cluster (superimposing All CE hotspots of members in a cluster).
hn.pl.lr_cluster_ce_hotspot_plot(lr_df=clustered_expressed_LR_df,
                                 cell_cci_centrality=cell_cci_centrality,
                                 adata=adata)

解码功能性细胞间通讯事件的全息图

For each downstream target gene, HoloNet:

  • Identifies cell types serving as major senders
  • Identifies ligand–receptor pairs serving as core mediators

in FCEs for specific downstream genes.

The tutorial mainly follows these steps:
Load data and construct multi-view communication event (CE) network
Predict specific target gene expression.
Decode the FCEs for the gene by interpreting the trained model.
Identify genes more affected by cell–cell communication.

import HoloNet as hn

import os
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import torch

import warnings
warnings.filterwarnings('ignore')
hn.set_figure_params(tex_fonts=False)
sc.settings.figdir = './figures/'
Data loading and constructing multi-view CE network
adata = hn.pp.load_brca_visium_10x()
####Load the ligand-receptor (LR) pair database and filter the LR pair:
LR_df = hn.pp.load_lr_df()
expressed_LR_df = hn.pp.get_expressed_lr_df(LR_df, adata, expressed_proportion=0.3)

Then build multi-view CE network using HoloNet.tools.compute_ce_tensor() and HoloNet.tools.filter_ce_tensor().

w_best = hn.tl.default_w_visium(adata)
CE_tensor = hn.tl.compute_ce_tensor(adata, lr_df=expressed_LR_df, w_best=w_best)
CE_tensor_filtered = hn.tl.filter_ce_tensor(CE_tensor, adata,
                                            lr_df=expressed_LR_df, w_best=w_best)
Predicting the target gene expression using a graph model

construct a multi-view graph learning model to predict the expression of gene on interest.

Selecting the target gene to be predicted

Firstly, we select the target genes to be predicted.

The genes with too low, too sparse or too average expression are filtered out. The filtering parameters can be changed in HoloNet.predicting.get_gene_expr().
Then we select MMP11, a gene related to tumor invasion as an example target gene.

target_all_gene_expr, used_gene_list = hn.pr.get_gene_expr(adata, expressed_LR_df)
target = hn.pr.get_one_case_expr(target_all_gene_expr, cases_list=used_gene_list,
                                 used_case_name='MMP11')
sc.pl.spatial(adata, color=['MMP11'], cmap='Spectral_r', size=1.4, alpha=0.7)

Getting inputs of the graph model

In the multi-view graph learning model, we:

  • Use the cell-type matrix as the feature matrix.
  • Use normalized multi-view CE network as the adjacency matrix.
Get the feature matrix and adjacency matrix
X, cell_type_names = hn.pr.get_continuous_cell_type_tensor(adata, continuous_cell_type_slot = 'predicted_cell_type',)
adj = hn.pr.adj_normalize(adj=CE_tensor_filtered, cell_type_tensor=X, only_between_cell_type=True)

If selecting to use categorical cell-type labels, the feature matrix can be derived from HoloNet.predicting.get_one_hot_cell_type_tensor().

Training the graph model

The we train the graph model to predict MMP11 expression. If GPU is avaiable, you can set the device parameter as ‘gpu’. Otherwise, we use CPU by default. The predicted MMP11 expression pattern are similar to the true pattern.

trained_MGC_model_MMP11_list = hn.pr.mgc_repeat_training(X, adj, target, device='gpu')
predict_result_MMP11 = hn.pl.plot_mgc_result(trained_MGC_model_MMP11_list, adata, X, adj)
np.corrcoef(predict_result_MMP11.T, target.T)[0,1]

Decode the FCEs for the gene by interpreting the trained model

After training the graph model and find the predicted expression profile similar to the true one, we can interprete the trained model to reveal the holography of FCEs.

For each target gene, there are three main output figure:

LR rank

  • Identify ligand–receptor (LR) pairs serving as core mediators in FCEs for the target gene.

Cell-type-level FCE network

  • Identifies cell types serving as major senders
  • The cell-type-level FCE network can be LR-pair-specific or general.

Delta E proportion in each cell-type

  • Identify target gene expression in which cell-types are more dominated by FCEs.
LR rank

Plot the top 15 LR pairs (15 can be changed using plot_lr_num parameter) with the highest view attention weights The heatmap displays the attention weights of each view obtained from repeated training for 50 times. The bar plot represents the mean values of the attention weights of each view.

ranked_LR_df_for_MMP11 = hn.pl.lr_rank_in_mgc(trained_MGC_model_MMP11_list, expressed_LR_df,
                                              plot_cluster=False, repeat_attention_scale=True)

If you want plot the LR-pair clustering results in the LR rank plot, you can set cluster_col=True and provide clustering results in expressed_LR_df.
LR pair clustering can see the first tutorial and HoloNet.tools.cluster_lr_based_on_ce().

Cell-type-level FCE network

Cell-type-level POSTN:PTK7 FCE network for MMP11. The thickness of the edge represents the strength of POSTN:PTK7 FCEs between the two cell types. The network are derived from interpreting the graph convolutional layer.

In the plot, you can focus on one cell-type and look the edges targeted on it, in order to identify which cell-types are the major sender for it.

_ = hn.pl.fce_cell_type_network_plot(trained_MGC_model_MMP11_list, expressed_LR_df, X, adj,
                                     cell_type_names, plot_lr='POSTN:PTK7', edge_thres=0.2,
                                     palette=hn.brca_default_color_celltype,)

If plot_lr is one of the LR pair in the expressed_LR_df, HoloNet.plotting.fce_cell_type_network_plot() will plot the cell-type-level FCE network for a specific LR pair. If plot_lr='all', it will plot the general cell-type-level FCE network for all LR pairs.

Delta E proportion in each cell-type

Identify target gene expression in which cell-types are more dominated by FCEs.

The ratio of the expression change caused by CEs (ΔE) to the sum of ΔE and the baseline MMP11 expression (E0) in each cell type.

delta_e = hn.pl.delta_e_proportion(trained_MGC_model_MMP11_list, X, adj,
                                    cell_type_names,
                                    palette = hn.brca_default_color_celltype)

Identify genes more affected by cell–cell communication

We train the graph model for all selected target genes. (HoloNet.predicting.get_gene_expr() select target gene to be predicted)

Comparing with prediction only using cell-type information, the target with higher performance improvement after considering CEs can be regarded as the genes more affected by cell–cell communication.

trained_MGC_model_only_type_list, \
trained_MGC_model_type_GCN_list = hn.pr.mgc_training_for_multiple_targets(X, adj, target_all_gene_expr, device='gpu')

The training process will take a lot of time, you can select to:

Get the predicting results of all target genes
predicted_expr_type_GCN_df = hn.pr.get_mgc_result_for_multiple_targets(trained_MGC_model_type_GCN_list,
                                                                        X, adj,
                                                                        used_gene_list, adata)
predicted_expr_only_type_df = hn.pr.get_mgc_result_for_multiple_targets(trained_MGC_model_only_type_list,
                                                                        X, adj,
                                                                        used_gene_list, adata)

Calculate the Pearson correlation between the predicted expression and the true expression. Compare the correlation from model only using cell-type information and the ones from HoloNet.

The head target genes in only_type_vs_GCN_all table are the genes more affected by cell–cell communication. Gene Ontology (GO) enrichment can be implemented based on the table.

only_type_vs_GCN_all = hn.pl.find_genes_linked_to_ce(predicted_expr_type_GCN_df,
                                                     predicted_expr_only_type_df,
                                                     used_gene_list, target_all_gene_expr,
                                                     plot_gene_list = ['MMP11'], linewidths=0.5)

only_type_vs_GCN_all.head(15)

Model saving and loading

Trained model can be saved to avoid repetitive training. The models will be saved in ‘model_save_folder/project_name/gene_name’. The ‘gene_name’ are genes in the target_gene_name_list. For different trained model, you can set different project_name.

hn.pr.save_model_list(trained_MGC_model_type_GCN_list,
                      project_name='BRCA_10x_generating_all_target_gene_type_GCN',
                      target_gene_name_list=used_gene_list)
hn.pr.save_model_list(trained_MGC_model_only_type_list,
                      project_name='BRCA_10x_generating_all_target_gene_only_type',
                      target_gene_name_list=used_gene_list)

Setting the target_gene_name_list as one gene, the trained model for MMP11 can be saved in the same way.

Model loading. used_genes are the list of ‘gene_name’ before. Note that the order of used_genes is different from the used_gene_list before.

trained_MGC_model_only_type_list_tmp, \
used_genes = hn.pr.load_model_list(X, adj, project_name='BRCA_10x_generating_all_target_gene_only_type',
                                   only_cell_type=True)
trained_MGC_model_type_GCN_list_tmp, \
used_genes = hn.pr.load_model_list(X, adj, project_name='BRCA_10x_generating_all_target_gene_type_GCN')

Using the loaded model, you can repeat the results in the previous section.

predicted_expr_type_GCN_df_tmp = hn.pr.get_mgc_result_for_multiple_targets(trained_MGC_model_type_GCN_list_tmp,
                                                                        X, adj,
                                                                        used_genes, adata)
predicted_expr_only_type_df_tmp = hn.pr.get_mgc_result_for_multiple_targets(trained_MGC_model_only_type_list_tmp,
                                                                        X, adj,
                                                                        used_genes, adata)
only_type_vs_GCN_all2 = hn.pl.find_genes_linked_to_ce(predicted_expr_type_GCN_df_tmp.loc[:,used_gene_list],
                                                     predicted_expr_only_type_df_tmp.loc[:,used_gene_list],
                                                     used_gene_list, target_all_gene_expr,
                                                     plot_gene_list = ['MMP11'], linewidths=0.5)

参考网址在HoloNet: Decoding functional cell–cell communication events by multi-view graph learning on spatial transcriptomics

好了,已经分享给大家了,生活很好,有你更好

  • 28
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值