pyscenic 安装单细胞转录因子分析数据下载

生信小博士

已于 2024-06-12 15:46:09 修改

阅读量1.5k

点赞数

文章标签： python 深度学习开发语言

于 2023-05-23 17:43:20 首次发布

本文链接：https://blog.csdn.net/qq_52813185/article/details/129238961

版权

Installation and Usage — pySCENIC latest documentationhttps://pyscenic.readthedocs.io/en/latest/installation.html

r版本：rSCENIC-R安装及运行 - 生信小木屋

在SCENIC的R版本中目前不适用使用新版本的数据库，具体的issue见feather v1 or v2 for R package。因此为了使用R版本的issue使用的旧版本的数据库

cisTarget databases - Feather v1 databases

特别注意：

If you are using pySCENIC < 0.12.0 and ctxcore < 0.2.0 you will need to retrieve the databases from the old folder, in feather v1 format.

However, we recommend to update to the latest versions to use feather v2 (smaller and easier to read) and use the most recent databases and annotations (v10nr_clust/mc_v10_clust).

There are two main types of databases:

Gene-based databases are meant to be used with (py)SCENIC and for motif enrichment in gene sets with cisTarget.
Region-based databases are meant to be used with SCENIC+ and for motif enrichment in region sets with cisTarget.

step1 安装


conda create -n pyscenic python=3.9
conda activate pyscenic
#安装依赖包
conda install -y numpy
conda install -y -c anaconda cytoolz
conda install -y scanpy


#安装pyscenic
pip install pyscenic -i http://pypi.douban.com/simple/

step 2TF注释 Auxiliary datasets

https://resources.aertslab.org/cistarget/databases/https://resources.aertslab.org/cistarget/databases/

根据物种选择数据库

1.feather格式的ranking排名数据库

2.基序==》转录因子 注释数据库 TSV text 文件格式 .tbl 浏览器

3.转录因子列表浏览器复制 txt

To successfully use this pipeline you also need auxilliary datasets available at cistargetDBs website:

Databases ranking the whole genome排名数据库Databases ranking the whole genome of your species of interest based on regulatory features (i.e. transcription factors) in feather format.
Motif to TF annotations注释数据库Motif to TF annotations database providing the missing link between an enriched motif and the transcription factor that binds this motif. This pipeline needs a TSV text file where every line represents a particular annotation.

Caution

These ranking databases are 1.1 Gb each so downloading them might take a while. An annotations file is typically 100Mb in size.

A list of transcription factors is required for the network inference step (GENIE3/GRNBoost2).

1 转录因子数据库下载 TF lists

mkdir all_tf_list && cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
2016 cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
2017 ls
2018 wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt
2019 wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt

2 feather文件下载排名数据库

Welcome to the cisTarget resources website!

To download a database for motif enrichment, go to databases.
To download a motif annotations, go to motif2tf.
To download our cluster-buster implementation, go to programs.
To download precomputed regions for creating gene-based databases, go to regions.
To download the lists of transcription factors (TFs) for human, mouse and fly, go to tf_lists.
To download chip-seq tracks annotations, go to track2tf.

We recommend using the most recent databases and annotations (v10nr_clust).

IMPORTANT: The cisTarget database files are quite big (most of them 1-100GB). To avoid corrupt or incomplete downloads, files can be downloaded with zsync_curl (which is basically rsync over HTTP(S)). It allows resuming already partially downloaded databases and only will download missing or redownload corrupted chunks.

# Download (with wget or curl):
wget https://resources.aertslab.org/cistarget/zsync_curl
# curl -O https://resources.aertslab.org/cistarget/zsync_curl

# Make executable:
chmod a+x zsync_curl

# Display full path to zsync_curl.
ZSYNC_CURL="${PWD}/zsync_curl"
echo "${ZSYNC_CURL}"
	
# Compile zsync_curl from source
# Display path to zsync_curl:
ZSYNC_CURL='zsync_curl'
echo "${ZSYNC_CURL}"

Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene based

Homo sapiens - hg38

Homo sapiens - hg38 - refseq_r80 - v9 databases - Gene based

hg38

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'

# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'


feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'



# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"

但是现在更新了！！！Homo sapiens - hg38 - refseq_r80

推荐使用mc_v10_clust

mm10 Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene based

mkdir mm10 &&cd mm10

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
feather_database="${feather_database_url##*/}"

# Download database directly (with wget or curl):
wget "${feather_database_url}"

# curl -O "${feather_database_url}"

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'
#feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'

feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'


feather_database="${feather_database_url##*/}"

# Download database directly (with wget or curl):
wget "${feather_database_url}"
# curl -O "${feather_database_url}"





# Download sha256sum.txt (with wget or curl):
wget https://resources.aertslab.org/cistarget/databases/sha256sum.txt
# curl -O https://resources.aertslab.org/cistarget/databases/sha256sum.txt

# Check if sha256 checksum matches for the downloaded database:
awk -v feather_database=${feather_database} '$2 == feather_database' sha256sum.txt | sha256sum -c -

# If you downloaded mulitple databases, you can check them all at onces with:
sha256sum -c sha256sum.txt

#############最新feather数据库 v10

hg38

Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene basedhttps://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/

https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

mm10

Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene basedhttps://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/ https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

3注释数据库 tbl

Motif2TF annotationshttps://resources.aertslab.org/cistarget/motif2tf/

Motif2TF annotations

We provide motif annotations for the following species:

Human (hgnc)
Mouse (mgi)
Fly (flybase)
Chicken

使用条件范围

For each specie, we provide annotations depending on the motif collection used:

v8 (only Drosophila): Annotations based on the 2016 cisTarget motif collection. Use these files if you are using the mc8nr databases (only available for Drosophila).
v9: Annotations based on the 2017 cisTarget motif collection. Use these files if you are using the mc9nr databases.
v10: Annotations based on the 2022 SCENIC+ motif collection. Use these files if you are using the mc_v10_clust databases.

三个数据库

使用pyscenic做转录因子分析虽然有转录因子的缺失，但是转录组因子的规律并没有变化，在iCAF和mCAF这个亚群特异性激活的转录因子保持原文的样子。https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg

提取单细胞表达量矩阵csv忘记并且导入Linux服务器

首先我们对文章《Single-cell RNA sequencing highlights the role of inflammatory cancer-associated fibroblasts in bladder urothelial carcinoma》的单细胞转录组数据进行降维聚类分群，然后提取fibo这个子亚群，然后再随机挑取1000个fibo细胞，这样的表达量矩阵进行后续分析。

在seurat里面将矩阵筛选，然后输出成csv，再用python读入，然后打包成 loom

#注意矩阵一定要转置，不然会报错
write.csv(t(as.matrix(fibo@assays$RNA@counts)),file = "fibo_1000.csv")

Singularity/Apptainer

Singularity/Apptainer images can be build from the Docker Hub image as source:

# pySCENIC CLI version.
singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
apptainer build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1

# pySCENIC CLI version + ipython kernel + scanpy.
singularity build aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1
apptainer build aertslab-pyscenic-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1

To run the aertslab-pyscenic-0.12.1.sif Singularity container with the pyscenic grn command, you can use the following command:

bashCopy code

singularity run aertslab-pyscenic-0.12.1.sif \
 pyscenic grn \
 -B /data:/data \
 --num_workers 6 \ 
-o /data/expr_mat.adjacencies.tsv \ 
/data/expr_mat.tsv \ 
/data/allTFs_hg38.txt

This command assumes that you have the aertslab-pyscenic-0.12.1.sif Singularity container file in your current working directory. If the container file is located elsewhere, you need to provide the full path to the container file in the singularity run command.

The command mounts the /data directory inside the container to the local /data directory on your system using the -B option. This allows accessing files and directories under /data from within the container.

To clarify, /data before the colon (:) represents the directory path on the host system. This is the directory that you want to make accessible within the container.

On the other hand, /data after the colon (:) represents the mount point within the container. This is the directory path where the host directory will be accessible from within the container.

System-defined bind paths

The system administrator has the ability to define what bind paths will be included automatically inside each container. Some bind paths are automatically derived (e.g. a user’s home directory) and some are statically defined (e.g. bind paths in the SingularityCE configuration file). In the default configuration, the system default bind points are $HOME , /sys:/sys , /proc:/proc, /tmp:/tmp, /var/tmp:/var/tmp, /etc/resolv.conf:/etc/resolv.conf, /etc/passwd:/etc/passwd, and $PWD. Where the first path before : is the path from the host and the second path is the path in the container.

The pyscenic grn command is executed inside the container with the specified options and arguments. The output file expr_mat.adjacencies.tsv will be written to the /data directory on your system. The /data/expr_mat.tsv and /data/allTFs_hg38.txt files are assumed to be input files located in the local /data directory.

Make sure to adjust the paths and filenames according to your specific setup before running the command.