pyscenic 安装 单细胞转录因子分析 数据下载

Installation and Usage — pySCENIC latest documentationicon-default.png?t=N7T8https://pyscenic.readthedocs.io/en/latest/installation.html

r版本  :​​​​​​rSCENIC-R安装及运行 - 生信小木屋

在SCENIC的R版本中目前不适用使用新版本的数据库,具体的issue见feather v1 or v2 for R package。因此为了使用R版本的issue使用的旧版本的数据库

cisTarget databases - Feather v1 databases

特别注意:

If you are using pySCENIC < 0.12.0 and ctxcore < 0.2.0 you will need to retrieve the databases from the old folder, in feather v1 format.

However, we recommend to update to the latest versions to use feather v2 (smaller and easier to read) and use the most recent databases and annotations (v10nr_clust/mc_v10_clust).

There are two main types of databases:

  • Gene-based databases are meant to be used with (py)SCENIC and for motif enrichment in gene sets with cisTarget.
  • Region-based databases are meant to be used with SCENIC+ and for motif enrichment in region sets with cisTarget.

step1 安装


conda create -n pyscenic python=3.9
conda activate pyscenic
#安装依赖包
conda install -y numpy
conda install -y -c anaconda cytoolz
conda install -y scanpy


#安装pyscenic
pip install pyscenic -i http://pypi.douban.com/simple/

step 2TF注释 Auxiliary datasets

https://resources.aertslab.org/cistarget/databases/icon-default.png?t=N7T8https://resources.aertslab.org/cistarget/databases/

根据物种选择数据库

1.feather格式的ranking排名数据库

2.基序==》转录因子 注释数据库  TSV text 文件格式 .tbl   浏览器

3.转录因子列表  浏览器复制 txt

To successfully use this pipeline you also need auxilliary datasets available at cistargetDBs website:

  1. Databases ranking the whole genome排名数据库Databases ranking the whole genome of your species of interest based on regulatory features (i.e. transcription factors) in feather format.
  2. Motif to TF annotations注释数据库Motif to TF annotations database providing the missing link between an enriched motif and the transcription factor that binds this motif. This pipeline needs a TSV text file where every line represents a particular annotation.

Caution

These ranking databases are 1.1 Gb each so downloading them might take a while. An annotations file is typically 100Mb in size.

  1. list of transcription factors is required for the network inference step (GENIE3/GRNBoost2).

1 转录因子数据库下载  TF lists

mkdir all_tf_list && cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
 2016  cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
 2017  ls
 2018  wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt
 2019  wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt
 

2 feather文件下载  排名数据库

Welcome to the cisTarget resources website!

  • To download a database for motif enrichment, go to databases.
  • To download a motif annotations, go to motif2tf.
  • To download our cluster-buster implementation, go to programs.
  • To download precomputed regions for creating gene-based databases, go to regions.
  • To download the lists of transcription factors (TFs) for human, mouse and fly, go to tf_lists.
  • To download chip-seq tracks annotations, go to track2tf.

We recommend using the most recent databases and annotations (v10nr_clust).

IMPORTANT: The cisTarget database files are quite big (most of them 1-100GB). To avoid corrupt or incomplete downloads, files can be downloaded with zsync_curl (which is basically rsync over HTTP(S)). It allows resuming already partially downloaded databases and only will download missing or redownload corrupted chunks.

# Download (with wget or curl):
wget https://resources.aertslab.org/cistarget/zsync_curl
# curl -O https://resources.aertslab.org/cistarget/zsync_curl

# Make executable:
chmod a+x zsync_curl

# Display full path to zsync_curl.
ZSYNC_CURL="${PWD}/zsync_curl"
echo "${ZSYNC_CURL}"
	
# Compile zsync_curl from source
# Display path to zsync_curl:
ZSYNC_CURL='zsync_curl'
echo "${ZSYNC_CURL}"

 Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene based

Homo sapiens - hg38

Homo sapiens - hg38 - refseq_r80 - v9 databases - Gene based

 hg38

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"
# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'


feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'



# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"

但是现在更新了!!!Homo sapiens - hg38 - refseq_r80

推荐使用mc_v10_clust

mm10    Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene based

mkdir mm10 &&cd mm10

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
feather_database="${feather_database_url##*/}"

# Download database directly (with wget or curl):
wget "${feather_database_url}"

# curl -O "${feather_database_url}"

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'
#feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'

feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'


feather_database="${feather_database_url##*/}"

# Download database directly (with wget or curl):
wget "${feather_database_url}"
# curl -O "${feather_database_url}"




# Download sha256sum.txt (with wget or curl):
wget https://resources.aertslab.org/cistarget/databases/sha256sum.txt
# curl -O https://resources.aertslab.org/cistarget/databases/sha256sum.txt

# Check if sha256 checksum matches for the downloaded database:
awk -v feather_database=${feather_database} '$2 == feather_database' sha256sum.txt | sha256sum -c -

# If you downloaded mulitple databases, you can check them all at onces with:
sha256sum -c sha256sum.txt

#############最新feather数据库  v10

hg38 

Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene basedicon-default.png?t=N7T8https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/

https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

mm10

Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene basedicon-default.png?t=N7T8https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/ https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

3注释数据库 tbl

Motif2TF annotationsicon-default.png?t=N7T8https://resources.aertslab.org/cistarget/motif2tf/

Motif2TF annotations

We provide motif annotations for the following species:

  • Human (hgnc)
  • Mouse (mgi)
  • Fly (flybase)
  • Chicken

使用条件范围

For each specie, we provide annotations depending on the motif collection used:

  • v8 (only Drosophila): Annotations based on the 2016 cisTarget motif collection. Use these files if you are using the mc8nr databases (only available for Drosophila).
  • v9: Annotations based on the 2017 cisTarget motif collection. Use these files if you are using the mc9nr databases.
  • v10: Annotations based on the 2022 SCENIC+ motif collection. Use these files if you are using the mc_v10_clust databases.

 三个数据库

使用pyscenic做转录因子分析虽然有转录因子的缺失,但是转录组因子的规律并没有变化,在iCAF和mCAF这个亚群特异性激活的转录因子保持原文的样子。icon-default.png?t=N7T8https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg

提取单细胞表达量矩阵csv忘记并且导入Linux服务器

首先我们对文章《Single-cell RNA sequencing highlights the role of inflammatory cancer-associated fibroblasts in bladder urothelial carcinoma》的单细胞转录组数据进行降维聚类分群,然后提取fibo这个子亚群,然后再随机挑取1000个fibo细胞,这样的表达量矩阵进行后续分析。

在seurat里面将矩阵筛选,然后输出成csv,再用python读入,然后打包成 loom

#注意矩阵一定要转置,不然会报错
write.csv(t(as.matrix(fibo@assays$RNA@counts)),file = "fibo_1000.csv")

Singularity/Apptainer

Singularity/Apptainer images can be build from the Docker Hub image as source:

# pySCENIC CLI version.
singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
apptainer build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1

# pySCENIC CLI version + ipython kernel + scanpy.
singularity build aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1
apptainer build aertslab-pyscenic-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1

To run the aertslab-pyscenic-0.12.1.sif Singularity container with the pyscenic grn command, you can use the following command:

 

bashCopy code

singularity run aertslab-pyscenic-0.12.1.sif \
 pyscenic grn \
 -B /data:/data \
 --num_workers 6 \ 
-o /data/expr_mat.adjacencies.tsv \ 
/data/expr_mat.tsv \ 
/data/allTFs_hg38.txt

This command assumes that you have the aertslab-pyscenic-0.12.1.sif Singularity container file in your current working directory. If the container file is located elsewhere, you need to provide the full path to the container file in the singularity run command.

The command mounts the /data directory inside the container to the local /data directory on your system using the -B option. This allows accessing files and directories under /data from within the container.

To clarify, /data before the colon (:) represents the directory path on the host system. This is the directory that you want to make accessible within the container.

On the other hand, /data after the colon (:) represents the mount point within the container. This is the directory path where the host directory will be accessible from within the container.

System-defined bind paths

The system administrator has the ability to define what bind paths will be included automatically inside each container. Some bind paths are automatically derived (e.g. a user’s home directory) and some are statically defined (e.g. bind paths in the SingularityCE configuration file). In the default configuration, the system default bind points are $HOME , /sys:/sys , /proc:/proc/tmp:/tmp/var/tmp:/var/tmp/etc/resolv.conf:/etc/resolv.conf/etc/passwd:/etc/passwd, and $PWD. Where the first path before : is the path from the host and the second path is the path in the container.

The pyscenic grn command is executed inside the container with the specified options and arguments. The output file expr_mat.adjacencies.tsv will be written to the /data directory on your system. The /data/expr_mat.tsv and /data/allTFs_hg38.txt files are assumed to be input files located in the local /data directory.

Make sure to adjust the paths and filenames according to your specific setup before running the command.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

生信小博士

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值