bam文件处理 转fq

原始 BAM 文件和 sort 之后 BAM 文件的行数,是一样的。
SEQanswers:BAM is compressed. Sorting helps to give a better compression ratio because similar sequences are grouped together.

bam转回fq时报错: github查找同问题结果1 2 3

 *****WARNING: Query 17 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query 13 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query 223 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.

sort -n 后 warning行数变少1,333,109,095变为11,212,985
???:

nohup samtools sort -n S1_T_SRR1273943.bam -o ./S1_T_SRR1273943.sortedByName.bam >log.S1.bam.sortbyname 2>&1 &

nohup bedtools bamtofastq  -i S1_T_SRR1273943.sortedByName.bam -fq S1_T_1.fq -fq2 S1_T_2.fq > log_S1_sortedbyname 2>&1 &

用sortbam得到的fq后续分析继续报错

reads: 0 |ERROR: The mate1 read name did not match the mate2 read name. Resynchr onization support needs to be implemented.

?暂时解决方法,提取序列之后可以按照read name排序,然后提取。为何对于排序后的用bedtools bamtofastq得到的结果会后续报错,而samtools fastq暂时没有

samtools sort -n  bam -o sorted.bam| samtools fastq -1 read_1.fq -2 read_2.fq -s singleton.fq -

samtools fastq

一般而言BAM文件都是按照位置信息排序,想要找到配对的reads,要么是根据read的编号进行排序(这个方法要求额外的内存和存储空间),或者就是在提取的时候记录当前的read的ID,再找到另一端ID后释放内存空间。

BAM中reads名称和fq中reads名称差异,mate1 read name----mate2 read name?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值