ChromHMM作为一个强大的注释染色体状态的工具,却只能支持下图中常见的几个物种,如果需要对其他物种进行注释需要相应物种的注释文件
以crab-eating macaque为例
首先从UCSC下载refeq bed 文件
下载文件格式如下,后面的注释文件大部分都是由这个文件处理产生的
CHROMSIZES
UCSC下载:https://hgdownload.soe.ucsc.edu/goldenPath/macFas5/bigZips/macFas5.chrom.sizes
ANCHORFILES
1、RefSeqTSS.macFas5.txt.gz
sed 's/"/\t/g' macFas5.ucsc.RefSeq.bed | awk 'BEGIN{OFS=FS="\t"}{if($6=="+") {start=$2} else {if($6=="-") start=$3} ; print $1,start,$6;}' >RefSeqTSS.macFas5.txt
2、RefSeqTES.macFas5.txt.gz
sed 's/"/\t/g' macFas5.ucsc.RefSeq.bed | awk 'BEGIN{OFS=FS="\t"}{if($6=="+") {end=$3} else {if($6=="-") start=$2} ; print $1,end,$6;}' >RefSeqTES.macFas5.txt
COORDS
1、CpGIsland.macFas5.bed.gz
UCSC下载
awk -v FS="\t" -v OFS="\t" '{print $1,$2,$3}' macFas5.CpGIsland.bed >CpGIsland.macFas5.bed
2、RefSeqGene.macFas5.bed.gz
awk -v FS="\t" -v OFS="\t" '{print $1,$2,$3}' macFas5.ucsc.RefSeq.bed > RefSeqGene.macFas5.bed
3、RefSeqTSS2kb.macFas5.bed.gz
sed 's/"/\t/g' macFas5.ucsc.RefSeq.bed | awk 'BEGIN{OFS=FS="\t"}{if($6=="+") {start=$2-2000; end=$2+2001;} else {if($6=="-") start=$3-2000; end=$3+2001; } if(start<0) start=0; print $1,start,end;}' >RefSeqTSS2kb.macFas5.bed
4、RefSeqExon.macFas5.bed.gz
UCSC下载
awk -v FS="\t" -v OFS="\t" '{print $1,$2,$3}' macFas5.RefSeqExon.bed > RefSeqExon.macFas5.bed
5、RefSeqTES.macFas5.bed.gz
sed 's/"/\t/g' macFas5.ucsc.RefSeq.bed | awk 'BEGIN{OFS=FS="\t"}{if($6=="+") {start=$3; end=$3+1;} else {if($6=="-") start=$2; end=$2+1; } if(start<0) start=0; print $1,start,end;}' >RefSeqTES.macFas5.bed
6、RefSeqTSS.macFas5.bed.gz
sed 's/"/\t/g' macFas5.ucsc.RefSeq.bed | awk 'BEGIN{OFS=FS="\t"}{if($6=="+") {start=$2; end=$2+1;} else {if($6=="-") start=$3; end=$3+1; } if(start<0) start=0; print $1,start,end;}' >RefSeqTSS.macFas5.bed