原始 BAM 文件和 sort 之后 BAM 文件的行数,是一样的。
SEQanswers:BAM is compressed. Sorting helps to give a better compression ratio because similar sequences are grouped together.
bam转回fq时报错: github查找同问题结果1 2 3
*****WARNING: Query 17 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query 13 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query 223 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
sort -n 后 warning行数变少1,333,109,095变为11,212,985
???:
nohup samtools sort -n S1_T_SRR1273943.bam -o ./S1_T_SRR1273943.sortedByName.bam >log.S1.bam.sortbyname 2>&1 &
nohup bedtools bamtofastq -i S1_T_SRR1273943.sortedByName.bam -fq S1_T_1.fq -fq2 S1_T_2.fq > log_S1_sortedbyname 2>&1 &
用sortbam得到的fq后续分析继续报错
reads: 0 |ERROR: The mate1 read name did not match the mate2 read name. Resynchr onization support needs to be implemented.
?暂时解决方法,提取序列之后可以按照read name排序,然后提取。为何对于排序后的用bedtools bamtofastq得到的结果会后续报错,而samtools fastq暂时没有
samtools sort -n bam -o sorted.bam| samtools fastq -1 read_1.fq -2 read_2.fq -s singleton.fq -
samtools fastq
一般而言BAM文件都是按照位置信息排序,想要找到配对的reads,要么是根据read的编号进行排序(这个方法要求额外的内存和存储空间),或者就是在提取的时候记录当前的read的ID,再找到另一端ID后释放内存空间。