BWA软件安装和使用:
1.安装请参考【1】
2.使用:
hadoop@Mcnode1:~/cloud/adam/xubo/data/down-sratool/sra$ bwa aln ../../dmel-all-chromosome-r5.37/dmel-all-chromosome-r5.37.fasta DRR047093.fastq >RAL357_1.sai
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 6.11 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 9261 sequences have been processed.
[main] Version: 0.7.12-r1039
[main] CMD: bwa aln ../../dmel-all-chromosome-r5.37/dmel-all-chromosome-r5.37.fasta DRR047093.fastq
[main] Real time: 6.259 sec; CPU: 6.196 sec
bwa samse ../../dmel-all-chromosome-r5.37/dmel-all-chromosome-r5.37.fasta RAL357_1.sai DRR047093.fastq > RAL357_1.sam
hadoop@Mcnode1:~/cloud/adam/xubo/data/down-sratool/sra$ bwa samse ../../dmel-all-chromosome-r5.37/dmel-all-chromosome-r5.37.fasta RAL357_1.sai DRR047093.fastq > RAL357_1.sam
[bwa_aln_core] convert to sequence coordinate... 0.13 sec
[bwa_aln_core] refine gapped alignments... 0.19 sec
[bwa_aln_core] print alignments... 0.03 sec
[bwa_aln_core] 9261 sequences have been processed.
[main] Version: 0.7.12-r1039
[main] CMD: bwa samse ../../dmel-all-chromosome-r5.37/dmel-all-chromosome-r5.37.fasta RAL357_1.sai DRR047093.fastq
[main] Real time: 1.121 sec; CPU: 0.381 sec
查看生成的sam文件:
hadoop@Mcnode1:~/cloud/adam/xubo/data/down-sratool/sra$ more RAL357_1.sam
@SQ SN:YHet LN:347038
@SQ SN:dmel_mitochondrion_genome LN:19517
@SQ SN:2L LN:23011544
@SQ SN:X LN:22422827
@SQ SN:3L LN:24543557
@SQ SN:4 LN:1351857
@SQ SN:2R LN:21146708
@SQ SN:3R LN:27905053
@SQ SN:Uextra LN:29004656
@SQ SN:2RHet LN:3288761
@SQ SN:2LHet LN:368872
@SQ SN:3LHet LN:2555491
@SQ SN:3RHet LN:2517507
@SQ SN:U LN:10049037
@SQ SN:XHet LN:204112
@PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa samse ../../dmel-all-chromosome-r5.37/dmel-all-chromosome-r5.37.fasta RAL357_1.sai DRR047093.fastq
DRR047093.1 4 * 0 0 * * 0 0 CAAAGTGGCGTCGTCTTGAGCCCATCATCAATATCATCGTTTACATTAAGTAGAAAGTGTAACTAGACAAATGTTTTCATTTCCGCCTCGTTGTTGAACTCCCGTGGAGAA
CCCATGCTTCCCCTGATTTAACATCGGTATTGTATTCAATCCTTCTGCTCTCCCCGGCGAATGCATCGTTAATGGTTGGTTTCCGCGTAAACG I555IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBBBEHIIIIIIIIIIIIIIIIIIIIFIBD111;7///-;<557;;FFIIIIIIIII
IIIHH<<<@BIIIIIIIIIIIIIII>>>>HHIHHIIHIHHIHHHHHHIHIHIIIHIHHFCBBDDD<;1111779@DHIIIIIIIIIIIIIIIIIIB>>IIIIIIFFFII
DRR047093.2 4 * 0 0 * * 0 0 CAAAGTGGCGTCGTCTTGAGCCCGTCATCAATATCATCGTTTACATTAAGTAGAAAAGTGTAACTAGACAAAATGTTTTTCATTTCCGCCTCGTTGTTGAACTCCCGTGGA
GAACTCATGCTTCCCCTCGATTTAACTATCGGTATTGTATTCAATCCTTCTGCTCTCCCCGGCCGAATGCATCGTTAATGGTTGGTTTCCGTCGTAAACG IIIIIIIIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIFFIIIIIIIIIIIEECCIIIIIIIIIIIIECCCII99999HIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIFCGGDC<<66667/66>>@?:9399FIIAAAAFIIIFFFFEFFBBBAEFEDD:===;;0038?@@@@BBDCC=====;;CCBBBE::3:=FIFF==
DRR047093.3 0 3R 17056839 37 235M * 0 0 GAGAGATCCCGTGCCGTTAGCTTTAGATCCTCAGGAACCTGCGAGTAGTCAAAGTCCAGAACGATACTGGAGTCACCTTCGTTGTTATTGGCCGTCTCATAGG
TTTTGAGCAGCGCCTGGCGATCCACCTTGCCGTTGACCAGCAATGGAACGTGCTCCAGGATGACCACCTGCGGCGTCATGTAATCGGCTAGCTTGTCCTTGAGACGAGCCTCCATCTGCATCTCGGTGACCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBBBBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XT:A:UNM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0A234
DRR047093.4 16 3L 6707450 37 218M * 0 0 ACCGGGAATACTATATCGCGTGTCTATATAGTCTAGGTAAATATTGTGAGAGGCATAATGAAGATAATAATAATACAAAAACAATTTTTGTCTGAGTATACAATCGGTTTT
TGTGTGGTACTTTGCCTACTAAGTGCGGATGTATCTGAACTTTGCTTTCCCAGCTTTTCACTTCACTTAATTCGCT
hadoop@Mcnode1:~/cloud/adam/xubo/data/down-sratool/sra$ cat RAL357_1.sam | wc -l
9277
数据源:
curl -O ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r5.37_FB2011_05/fasta/dmel-all-chromosome-r5.37.fasta.gz gunzip dmel-all-chromosome-r5.37.fasta.gz
fastq-dump -Z DRR047093
<pre name="code" class="plain">fastq-dump DRR047093
更多数据:
Beginning with a list of desired SRA data sets (e.g., a list of SRA Run accessions, “SRRs”), the exact download location for that data file can be determined as follows:
wget/FTP root: ftp://ftp-trace.ncbi.nih.gov
ascp root: vog.hin.mln.ibcn.ptf@ptfnona:
Remainder of path:
/sra/sra-instant/reads/ByRun/sra/{SRR|ERR|DRR}/<first 6 characters of accession>/<accession>/<accession>.sra
Where
来源:http://www.ncbi.nlm.nih.gov/books/NBK158899/#SRA_download.downloading_sra_data_using
参考:
【1】 http://mingkang1217.blog.163.com/blog/static/2035227201101254921398/
【2】 http://ged.msu.edu/angus/tutorials-2011/bwa_tutorial.html