1. leiden聚类
sc.tl.leiden(adata, resolution=0.05, key_added='leiden_r0.05', random_state=10)
2.计算各类别marker基因
sc.tl.rank_genes_groups(adata, groupby='leiden_r0.05', key_added='rank_genes_r0.05')
#默认使用raw data
3.提供各类别marker基因
marker_genes = dict()
marker_genes['T'] = ['CD3G','CD3D','CD3E','CD2']
marker_genes['CD8+T'] = ['CD8A','GZMA']
marker_genes['CD4+T'] = ['CD4','FOXP3']
4.数据集中300个marker与自己提供的marker做overlap计算个数
cell_annotation = sc.tl.marker_gene_overlap(adata, marker_genes, key='rank_genes_r0.05',top_n_markers = 300)
5.overlap marker gene个数做标准化计算占比
cell_annotation_norm = sc.tl.marker_gene_overlap(adata, marker_genes, key='rank_genes_r0.05', normalize='reference',top_n_markers = 300)
6.重命名到cluster上