预测细胞类型特定的可访问染色质区域上共同出现的转录因子
1. 为DNase-seq实验推导出细胞类型特异性DNase高敏位点(CTS-DHS)和泛在性DNase高敏位点(ubiq-DHS)。所有的输入文件(200bp窗口上的读数,无重复)名称包括路径和相应的细胞类型应保存在data/all_files.csv。输出文件保存在:output_directory/top_regions/作为bed文件,每个细胞类型在单独的文件中。
Usage:
Rscript scripts/calculate.cts-dhs.R -c "count.directory" -t "data/all_files.csv" -w "data/ranges_hg19_200bp_masked_sorted.bed" -o "output.dir" -tpr 10000 -m 1
来自ENCODE的90种细胞类型的CTS-DHSs和ubiq-DHSs,基因组hg19存储在data/top_regions/中
2. Get the fasta files for all top_regions (each cell type in separate folder) using:
scripts/Get_fasta_tissues.sh
在使用Get_fasta_tissues.sh之前必须安装bedtools,必须下载相应的基因组。
3. Calculate binding affinities for PWMs of interest with TRAP using calculate.affinity.R:
Rscript scripts/calculate.affinity.R -m file.with.matrices -f format.matrices -s "data/top_regions/fasta" -t "data/cell_types.dat" -o "output.folder"
TRANSFAC矩阵的预计算亲和力存储在:data/affinity/(每种细胞类型有单独的文件夹,每种PWM有单独的文件)。
4. Calculate the TF-enrichment for all PWMs in a cell-type specific way and plot heatmaps of p-values and of odd ratios for all matrices and all cell types
Rscript scripts/calculate.tf.enrichment.R -l "data/list.of.matrices" -a "data/affinity" -t "data/cell_types_test.dat" -k 500 -n 5000 -o "results/enrichment" -p TRUE -d "results/plots"
5. Calculate the TF co-occurrence in a cell-type specific way for all possible pairs of TFs
Rscript scripts/calculate.tf.pairs.R -l "data/list.of.matrices" -a "data/affinity" -t "data/cell_types_test.dat" -k 500 -n 5000 -o "results/tf.pairs"