3.基于python的scRNA-seq细胞类型注释-自动注释

最新推荐文章于 2025-01-14 15:22:51 发布

tzc_fly

最新推荐文章于 2025-01-14 15:22:51 发布

阅读量2.2k

点赞数 20

分类专栏：生物计算工具文章标签： python

本文链接：https://blog.csdn.net/qq_40943760/article/details/139011204

版权

对于经验不够丰富的人员，手动注释是不可行的。因此现在自动注释变得更容易让人接受。自动注释分为统计模型和深度学习模型。

参考：
[1] https://github.com/Starlitnightly/single_cell_tutorial
[2] https://github.com/theislab/single-cell-best-practices

基于统计模型的SCSA

SCSA是早期的一个细胞类型自动注释工具，可以根据簇特异性marker，查询数据库中每一种细胞类型的marker，选出吻合度最高的一类细胞，如果差异性不够大，被认为是Unknown细胞。

首先，加载数据：

import omicverse as ov
import scanpy as sc
ov.ov_plot_set()

adata = ov.read('./data/s4d8_dimensionality_reduction.h5ad')

聚类，这里把resolution设置为2，用于识别更多细胞小类。聚类结果如下：
fig1
SCSA有两个数据库可以选择：CellMarker和panglaodb。在omicverse中，SCSA具有以下参数：

foldchange：每个簇相对于别的簇的差异倍数，一般设置为1.5，值越大，使用的marker越少
pvalue：与foldchange对应，在计算差异倍数的时候会进行统计学差异显著性分析，一般为0.01
celltype：该参数包括了normal和cancer两个参数，当设置为cancer，会注释出来自cancersea数据库的12种肿瘤细胞亚型
target：要使用的数据库，目前SCSA支持CellMarker和panglaodb，其实也包括cancersea
tissue：可以使用scsa.get_model_tissue()列出所有支持的组织，默认是使用全部tissue
model_path：设置数据库的本地存放路径

使用CellMarker数据库

初始化方式如下：

scsa=ov.single.pySCSA(adata=adata,
                      foldchange=1.5,
                      pvalue=0.01,
                      celltype='normal',
                      target='cellmarker',
                      tissue='All',
                      model_path='./database/pySCSA_2023_v2_plus.db'                    
)

正式的注释环节涉及三个参数：

clustertype：注释依据的簇名，存放在adata.obs
cluster：需要注释的簇，默认是all
rank_rep: 是否需要重新计算差异表达基因，如果在注释之前已经运算了sc.tl.rank_genes_groups，我们可以设置成False

我们使用SCSA注释细胞，并打印簇的类型：

adata.uns['log1p']['base']=10
res=scsa.cell_anno(clustertype='leiden_res1',
               cluster='all',rank_rep=True)

# 自动打印注释结果
scsa.cell_anno_print()

# 注释结果添加到adata
scsa.cell_auto_anno(adata,clustertype='leiden_res1',
                    key='scsa_celltype_cellmarker')

Cluster:0 Cell_type:Natural killer cell|T cell Z-score:13.328|9.099
Nice:Cluster:1 Cell_type:B cell Z-score:17.324
Cluster:2 Cell_type:Natural killer cell|T cell Z-score:12.711|7.851
Nice:Cluster:3 Cell_type:B cell Z-score:12.564
Nice:Cluster:4 Cell_type:Natural killer cell Z-score:11.75
Nice:Cluster:5 Cell_type:B cell Z-score:17.695
Nice:Cluster:6 Cell_type:Natural killer cell Z-score:14.859
Nice:Cluster:7 Cell_type:T cell Z-score:12.816
Cluster:8 Cell_type:T cell|Naive CD4+ T cell Z-score:5.584|3.975
Cluster:9 Cell_type:T cell|Natural killer cell Z-score:8.559|5.724
Nice:Cluster:10 Cell_type:Monocyte Z-score:14.734
Nice:Cluster:11 Cell_type:B cell Z-score:15.721
Cluster:12 Cell_type:Monocyte|Natural killer T (NKT) cell Z-score:14.965|12.388
Cluster:13 Cell_type:T cell|Natural killer cell Z-score:10.451|8.418
Nice:Cluster:14 Cell_type:Natural killer T (NKT) cell Z-score:16.389
Cluster:15 Cell_type:Natural killer T (NKT) cell|T cell Z-score:13.173|11.666
Cluster:16 Cell_type:Natural killer cell|T cell Z-score:7.567|7.555
Cluster:17 Cell_type:Natural killer T (NKT) cell|B cell Z-score:16.351|14.131
Nice:Cluster:18 Cell_type:Red blood cell (erythrocyte) Z-score:12.859
Nice:Cl