下载bwa对比软件:https://github.com/lh3/bwa
得到bwa-master.zip,然后进入保存该文件的目录,用命令unzip bwa-master.zip对其解压
rm -rf bwa-master.zip将zip文件删除,进入到bwa-master中,输入命令make
输入./bwa 可以查看帮助信息
配置bwa的环境变量:
① nano ~/.bash_profile #设置一些环境变量,设置完后按ctrl+x 、yes和tab
然后输入以下代码设置环境变量:
export PATH=$PATH: /where/to/install/bin
source ~/.bash_profile #在命令行输入,意思为保存文件
如果遇到 bash: ./xx: Permission denied 解决方法,即输入以下代码:
sudo chmod 777 ~/.bash_profile
② vim /etc/profile #另一种设置环境变量的方式
export PATH=/home/zach/bwa:$PATH #输入环境变量,ctrl+x
source /etc/profile #保存
下载samtools的命令:
wget https://github.com/samtools/samtools/releases/download/1.10/samtools-1.10.tar.bz2
tar -jxvf samtools-1.10.tar.bz2 #解压安装samtools-1.10
cd samtools-1.10 #进入samtools-1.10目录
./configure --prefix=/where/to/install #等号后面更改为你想要安装samtools的路径。
cd ./samtools #进入目录./samtools
export PATH=/home/zach/samtools/to/install/bin:$PATH >> ~/.bashrc #设置samtools的环境变量
source ~/.bashrc #保存
for i in *.sam
samtools import genome.fa $i $i.bam
samtools sort $i.bam -o $i.bam.sorted.bam
samtools index $i.bam.sorted.bam
done
(以下的过程都发生在安装bwa的目录下,因为不知道怎么在数据的目录下运行命令,因此将数据和参考基因组都copy到bwa目录下运行)
1)建立索引
bwa软件的作用是将序列比对到参考基因组上,在比对之前,首先需要对参考基因组建立索引,命令如下:
bwa index genome.fa #建立索引,genome.fa为你的参考基因组文件名
得到:
(base) [root@localhost bwa-master]# bwa index genome.fa
[bwa_index] Pack FASTA... 0.34 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=59246464, availableWord=16168340
[BWTIncConstructFromPacked] 10 iterations done. 26669680 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 49268176 characters processed.
[bwt_gen] Finished constructing BWT in 25 iterations.
[bwa_index] 19.05 seconds elapse.
[bwa_index] Update BWT... 0.24 sec
[bwa_index] Pack forward-only FASTA... 0.20 sec
[bwa_index] Construct SA from BWT and Occ... 12.55 sec
[main] Version: 0.7.17-r1198-dirty
[main] CMD: bwa index genome.fa
[main] Real time: 32.425 sec; CPU: 32.388 sec
# 双端合并序列,使用-p参数比对
for i in *gz
do
bwa mem -p genome.fa $i > ${i}.aln.sam
done
2)两个文件分别处理,寻找输入reads文件的SA坐标(双端测序)
bwa aln genome.fa SUC1_1_1.fq.gz -l 30 -k 2 -t 4 -I > SUC1_1_1.fq.gz.sai
bwa aln genome.fa SUC1_1_2.fq.gz > SUC1_1_2.fq.sai #命令行输入
得到:
(base) [root@localhost bwa-master]# bwa aln genome.fa SUC1_1_2.fq.gz > SUC1_1_2.fq.sai
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[main] Version: 0.7.17-r1198-dirty
[main] CMD: bwa aln genome.fa SUC1_1_2.fq.gz
[main] Real time: 0.050 sec; CPU: 0.053 sec
3)进行比对,不同的比对算法,命令不同
3)将sam进行排序,并转换为bam文件
samtools sort SUC1_1.sam --output-fmt BAM -o SUC1_1.sort.bam
4)统计所有位点的测序深度
samtools depth –a SUC1_1.sort.bam > SUC1_1.depth
参考: