Scanpy_1 单个样本的单细胞分析流程
代码说明:
- 视频链接:https://www.youtube.com/watch?v=5HuOGZEu2HY&list=PLi1VnGoeDGjsssKTF898Nu0a9ki9beb_W&index=4
- 代码链接:https://github.com/mousepixels/sanbomics_scripts/blob/main/Scanpy_intro_pp_clustering_markers.ipynb
- 该代码来自油管博主sanbomics,该博主是我目前看过的所有讲scanpy教程讲的最好的,不过视频是全英的而且需要科学上网;
- 该单细胞分析流程适用于只有一个样本的单细胞数据,没有整合,没有去批次效应内容;
- 该流程是sanbomics参照scanpy的官方文档中的示例代码稍作修改的结果;
#设置路径
!pwd #当前路径
!mkdir write
import scanpy as sc
import pandas as pd
import numpy as np
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor='white')
#声明h5ad用于存储分析结果:
results_file = 'write/pbmc3k.h5ad'
adata = sc.read_10x_mtx(
'/Users/panqiu/Downloads/00/', # `.mtx`文件所在的目录,注意改成自己的文件路径
var_names='gene_symbols', # 用 gene 作为var
cache=True) # 开启缓存读写
# 消除重复的列
adata.var_names_make_unique()
"""
注意cache=Trure
... writing an h5ad cache file to speedup reading next time
下次读取就不会从count matrix读, 会直接从cache目录下的h5ad文件读(更快)
"""
Preprocessing
sc.pp.filter_cells(adata, min_genes=200) #get rid of cells with fewer than 200 genes
sc.pp.filter_genes(adata, min_cells=3) #get rid of genes that are found in fewer than 3 cells
#IF YOU ARE DOING MOUSE YOU MIGHT NEED TO CHANGE MT- to Mt. Always double check you actually labeld MT
adata.var['mt'] = adata.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'。注意人的样本用'MT-',小鼠的样本用'mt-'
adata.var.mt.value_counts()#查看有几个mt基因,或者用sum(adata.var['mt'])
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p