写个pyscenic吧,之前一直用却从不记录
需要安装个环境
#创建环境conda create -y -n pyscenic python=3.7
conda activate pyscenic
自从官网说docker快,我就不喜欢本地安装了,镜像省事
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
docker --help
至此会出现下面的内容
sage: docker [OPTIONS] COMMAND
A self-sufficient runtime for containers
Common Commands:
run Create and run a new container from an image
exec Execute a command in a running container
ps List containers
build Build an image from a Dockerfile
pull Download an image from a registry
push Upload an image to a registry
images List images
login Authenticate to a registry
logout Log out from a registry
search Search Docker Hub for images
version Show the Docker version information
info Display system-wide information
启动docker
sudo service docker start
service docker status
会出现下面的内容
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; preset: enabled)
Active: active (running) since Sun 2024-11-24 21:52:34 CST; 35s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 35243 (dockerd)
Tasks: 19
Memory: 23.3M (peak: 25.9M)
CPU: 534ms
CGroup: /system.slice/docker.service
└─35243 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
11月 24 21:52:33 xuguangji dockerd[35243]: time="2024-11-24T21:52:33.565475976+08:00" level=info msg="Starting up"
11月 24 21:52:33 xuguangji dockerd[35243]: time="2024-11-24T21:52:33.566003490+08:00" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf"
11月 24 21:52:33 xuguangji dockerd[35243]: time="2024-11-24T21:52:33.643206738+08:00" level=info msg="Loading containers: start."
11月 24 21:52:34 xuguangji dockerd[35243]: time="2024-11-24T21:52:34.200135471+08:00" level=info msg="Loading containers: done."
11月 24 21:52:34 xuguangji dockerd[35243]: time="2024-11-24T21:52:34.235420933+08:00" level=warning msg="WARNING: bridge-nf-call-iptables is disabled"
11月 24 21:52:34 xuguangji dockerd[35243]: time="2024-11-24T21:52:34.235436690+08:00" level=warning msg="WARNING: bridge-nf-call-ip6tables is disabled"
11月 24 21:52:34 xuguangji dockerd[35243]: time="2024-11-24T21:52:34.235449612+08:00" level=info msg="Docker daemon" commit=41ca978 containerd-snapshotter=false storage-driver=overlay2 version=27.3.1
11月 24 21:52:34 xuguangji dockerd[35243]: time="2024-11-24T21:52:34.235515981+08:00" level=info msg="Daemon has completed initialization"
11月 24 21:52:34 xuguangji dockerd[35243]: time="2024-11-24T21:52:34.255898964+08:00" level=info msg="API listen on /run/docker.sock"
11月 24 21:52:34 xuguangji systemd[1]: Started docker.service - Docker Application Container Engine.
至此docker 安装完成
进入pyscenic的镜像下载
sudo docker pull aertslab/pyscenic:0.12.1
docker pull aertslab/pyscenic_scanpy:0.12.1_1.9.1
## 研究的是小鼠进行需要的数据库下载
wget -c https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
wget -c https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl
wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt
这个文件nerve_unfiltered.loom 的生成有python的形式也有R的形式可以
#实际需要的是一个没有进行sc.pp.normalize_total和sc.pp.log1p的文件
f_mtx_dir = '/ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/data/public/pbmc_10k_v3/filtered_feature_bc_matrix/'
adata = sc.read_10x_mtx(
f_mtx_dir , # the directory with the `.mtx` file
var_names='gene_symbols', # use gene symbols for the variable names (variables-axis index)
cache=False)
#然后比较简单了
import loompy as lp
f_loom_path_unfilt = "nerve_unfiltered.loom"
row_attrs = {
"Gene": np.array(adata.var.index) ,
}
col_attrs = {
"CellID": np.array(adata.obs.index) ,
"nGene": np.array( np.sum(adata.X.transpose()>0 , axis=0)).flatten() ,
"nUMI": np.array( np.sum(adata.X.transpose() , axis=0)).flatten() ,
}
lp.create( f_loom_path_unfilt, adata.X.transpose(), row_attrs, col_attrs )
当然也有R的版本哈
library(Seurat)
dim(uterus)# [1] 27001 27914table(uterus$orig.ident)
# AEH EEC HC # 9525 12033 6356
#对于每个样本抽取2000个细胞进行分析,这里只是演示,太多细胞没有意义,也是为了降低文件大小Idents(uterus) <- 'orig.ident'
sce_test <- subset(x = uterus, downsample = 2000)
table(sce_test$orig.ident)
# AEH EEC HC
# 2000 2000 2000
table(sce_test$celltype)
# Ciliated epithelial cells Endothelial cells Lymphocytes
# 427 631 2844
# Macrophages Smooth muscle cells Stromal fibroblasts
# 110 474 593
# Unciliated epithelial cells
# 921
dim(sce_test)#我们用这6000个细胞进行演示
# [1] 27001 6000
#提取表达矩阵,用于后续分析
Idents(sce_test) <- 'celltype'write.csv(t(as.matrix(sce_test@assays$RNA@counts)),file = "sce_exp.csv")
saveRDS(sce_test, file = 'sce_test.rds')
接下来还是需要再python里面完成
Python
import loompy as lp
import numpy as np
import scanpy as sc
# 读取CSV文件
x = sc.read_csv("sce_exp.csv")
# 设置行和列属性
row_attrs = {"Gene": np.array(x.var_names)}
col_attrs = {"CellID": np.array(x.obs_names)}
创建Loom文件
lp.create("nerve_unfiltered.loom", x.X.transpose(), row_attrs, col_attrs)
第一个步骤分析 时间很长的
sudo docker run -it --rm \
-v /home/sayhello/my_analysis/nerve/step3-pyscenic:/pyscenic aertslab/pyscenic:0.12.1 \
pyscenic grn \
--num_workers 15 \
--method grnboost2 \
--output /pyscenic/grn.csv \
/pyscenic/nerve_unfiltered.loom \
/pyscenic/allTFs_mm.txt