celltypist使用体验

最新推荐文章于 2024-10-10 08:43:01 发布

All_Will_Be_Fine噻

最新推荐文章于 2024-10-10 08:43:01 发布

阅读量1.5k

点赞数 28

分类专栏： R bioinfo scRNAseq 文章标签： R Seurat

本文链接：https://blog.csdn.net/jiangshandaiyou/article/details/137109190

版权

R 同时被 3 个专栏收录

84 篇文章

订阅专栏

bioinfo

69 篇文章

订阅专栏

scRNAseq

22 篇文章

订阅专栏

文章目录

brief
注意事项
实例演示
- 官方教程
- 现实数据
总结

brief

类似于singleR，可以对单细胞数据进行细胞注释，该分类器使用逻辑回归模型，训练集使用了一些已发表和注释的单细胞数据，训练集较大然后标签可能注释的比较好，结果是在该分类器在免疫细胞注释上表现较好。（听说比singleR注释准确一下，然后还发表在science上，那就用起来呗！）

注意事项

内置的 reference – annotation cell type
具体列表地址：https://www.celltypist.org/models

import celltypist
from celltypist import models
#Show all available models that can be downloaded and used.
models.models_description()
#Download a specific model, for example, `Immune_All_Low.pkl`.
models.download_models(model = 'Immune_All_Low.pkl')
#Download a list of models, for example, `Immune_All_Low.pkl` and `Immune_All_High.pkl`.
models.download_models(model = ['Immune_All_Low.pkl', 'Immune_All_High.pkl'])
#Update the models by re-downloading the latest versions if you think they may be outdated.
models.download_models(model = ['Immune_All_Low.pkl', 'Immune_All_High.pkl'], force_update = True)
#Show the local directory storing these models.
models.models_path

if the model argument is not specified, CellTypist will by default use the Immune_All_Low.pkl model
celltypist.annotate注释时的mode参数:
mode = ‘best match’ : each query cell is predicted into the cell type with the largest score/probability among all possible cell types
mode = ‘prob match’ : in some scenarios where a query cell cannot be assigned to any cell type in the reference model (i.e., a novel cell type) or can be assigned to multiple cell types (i.e., multi-label classification), a mode of probability match can be turned on (mode = ‘prob match’) with a probability cutoff (default to 0.5, p_thres = 0.5) to decide the cell types (none, 1, or multiple) assigned for a given cell.
majority voting classifier
By default, CellTypist will only do the prediction jobs to infer the identities of input cells, which renders the prediction of each cell independent. To combine the cell type predictions with the cell-cell transcriptomic relationships, CellTypist offers a majority voting approach based on the idea that similar cell subtypes are more likely to form a (sub)cluster regardless of their individual prediction outcomes.
默认情况下，celltypist对每个细胞进行注释，但是majority voting classifier分类器会考虑到细胞与细胞之间转录组的相似性，根据细胞转录组之间的相似度划分一个近似细胞类群，这个细胞类群内每个细胞注释的细胞类型主要是什么，那么这个细胞类群就是什么类型。
也就是相对细胞进行聚类，然后注释这个细胞类群。（During the majority voting, to define cell-cell relations, CellTypist will use a heuristic over-clustering approach according to the size of the input data with the aid of a Leiden clustering pipeline. ）
```
#Turn on the majority voting classifier as well.
predictions = celltypist.annotate(input_file, model = 'Immune_All_Low.pkl', majority_voting = True)
# 你也可以自己提供聚类信息
#Add your own over-clustering result.
# an input plain file with the over-clustering result of one cell per line.
# or a list-like object (such as a numpy 1D array) indicating the over-clustering result of all cells.
predictions = celltypist.annotate(input_file, model = 'Immune_All_Low.pkl', majority_voting = True, over_clustering = '/path/to/over_clustering/file')
```

实例演示

官方教程

conda activate R4
conda install -c bioconda -c conda-forge celltypist
python

### python 解释器
import celltypist
from celltypist import models

#Select the model from the above list. If the `model` argument is not provided, will default to `Immune_All_Low.pkl`.
model = models.Model.load(model = 'Immune_All_Low.pkl')
#The model summary information.
model
#Examine cell types contained in the model.
model.cell_types
#Examine genes/features contained in the model.
model.features

# the input data as a count table (cell-by-gene or gene-by-cell) in the format of txt/csv/tsv/tab/mtx/mtx.gz.

#Get a demo test data. This is a UMI count csv file with cells as rows and gene symbols as columns.
input_file = celltypist.samples.get_sample_csv()

#Predict the identity of each input cell.
# predictions = celltypist.annotate(input_file, model = 'Immune_All_Low.pkl')
predictions = celltypist.annotate(input_file, model = model)

# If your input file is in a gene-by-cell format (genes as rows and cells as columns), pass in the transpose_input = True argument.
# In addition, if the input is provided in the .mtx format, you will also need to specify the gene_file and cell_file
predictions = celltypist.annotate(input_file, model = model, transpose_input = True, gene_file = '/path/to/gene/file.txt', cell_file = '/path/to/cell/file.txt')

# 注释结果部分
#Summary information for the prediction result.
predictions
#Examine the predicted cell type labels.
predictions.predicted_labels
#Examine the matrix representing the decision score of each cell belonging to a given cell type.
predictions.decision_matrix
#Examine the matrix representing the probability each cell belongs to a given cell type (transformed from decision matrix by the sigmoid function).
predictions.probability_matrix
# 保存注释结果
#Export the three results to csv tables.
predictions.to_table(folder = '/path/to/a/folder', prefix = '')
#Alternatively, export the three results to a single Excel table (.xlsx).
predictions.to_table(folder = '/path/to/a/folder', prefix = '', xlsx = True)

#Visualise the predicted cell types overlaid onto the UMAP.
predictions.to_plots(folder = '/path/to/a/folder', prefix = '')

现实数据

R 
list.files("./processed_data/")
# [1] "pre_sce.rds"      "sce.combined.rds"
# sce.combined.rds是整合后的数据，我们需要使用其中的counts数据
sce <- readRDS("../processed_data/sce.combined.rds")
write.csv(sce@assays[["RNA"]]$counts,file="sce_integrated_raw_counts.csv")


python
import celltypist
from celltypist import models

models.models_description()
predictions = celltypist.annotate("./sce_integrated_raw_counts.csv", model = 'Immune_All_Low.pkl',transpose_input = True)
predictions.to_table(folder = './', prefix = './celltypsit_Immune_All_Low_')

predictions = celltypist.annotate("./sce_integrated_raw_counts.csv", model = 'Cells_Intestinal_Tract.pkl',transpose_input = True)
predictions.to_table(folder = './', prefix = './celltypsit_Cells_Intestinal_Tract_')

#########
models.models_description()
predictions = celltypist.annotate("./sce_integrated_raw_counts.csv", model = 'Immune_All_Low.pkl',transpose_input = True,majority_voting = True)
predictions.to_table(folder = './', prefix = './celltypsit_Immune_All_Low_MV_')

predictions = celltypist.annotate("./sce_integrated_raw_counts.csv", model = 'Cells_Intestinal_Tract.pkl',transpose_input = True,majority_voting = True)
predictions.to_table(folder = './', prefix = './celltypsit_Cells_Intestinal_Tract_MV_')

R
# 与Seurat对象整合，然后可视化
ct_CIT <- read.table("celltypsit_Cells_Intestinal_Tract_predicted_labels.csv",sep=",",header=T)
ct_IAL <- read.table("celltypsit_Immune_All_Low_predicted_labels.csv",sep=",",header=T)
library(stringr)
rownames(ct_CIT) <- str_replace_all(ct_CIT$X,pattern="\\.",replacement="-")
rownames(ct_IAL) <- str_replace_all(ct_IAL$X,pattern="\\.",replacement="-")

sce@meta.data$celltypist_IAL <- ct_IAL[rownames(sce@meta.data),"predicted_labels"]
sce@meta.data$celltypist_CIT <- ct_CIT[rownames(sce@meta.data),"predicted_labels"]