merge all exonic regions for each gene
该问题源同biostars中的一个问题类似:Question: How To Merge Isoforms For A Gene
简言之,将每个基因对应的所有转录本中有overlap的exon位置信息进行合并,即merge all exonic regions for each gene。
根据此,我暂时记录我的解决方案,运行消耗时间较长,待有空再寻思更好的解决方案
awk '$3=="exon"' xxxx.gtf > ${name}.gtf
gtf2bed < ${name}.gtf > ${name}.bed
for id in $(sort -k1,1V -k2,2n -k4,4V ${name}.bed |cut -f4 |sort |uniq)
do
awk '$4==geneid' geneid="$id" ${name}.bed |\
bedtools merge -i - -c 4,4,6 -o distinct,count,distinct -s >> ${name}_geneSeqInfo.bed
done