1.修改Makefile.common:
将
LIBBWA_LIBS = -lrt
改为
LIBBWA_LIBS = -lrt -lz
不然会报错误【5】
2.make之后修改java.library.path
步骤:
vi /etc/profile
加入
export LD_LIBRARY_PATH=/home/hadoop/xubo/tools/SparkBWA/build:$LD_LIBRARY_PATH
使生效:
source /etc/profile
3.本地运行脚本:
spark-submit --class SparkBWA \
--master local \
SparkBWA.jar \
-algorithm mem -reads paired \
-index /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta \
-partitions 3 \
/xubo/alignment/sparkBWA/GRCH38chr1L3556522N10L50paired1.fastq /xubo/alignment/sparkBWA/GRCH38chr1L3556522N10L50paired2.fastq \
/xubo/alignment/output/sparkBWA/datatestLocalGRCH38chr1L3556522N10L50paired12
4.运行结果:
hadoop@Master:~/xubo/tools/SparkBWA/build$ ./pairedGRCH38L1Local.sh
[Java_BwaJni_bwa_1jni] Arg 0 'bwa'
[Java_BwaJni_bwa_1jni] Algorithm found 1 'mem'
[Java_BwaJni_bwa_1jni] Arg 1 'mem'
[Java_BwaJni_bwa_1jni] Filename parameter -f found 2 '-f'
[Java_BwaJni_bwa_1jni] Arg 2 '-f'
[Java_BwaJni_bwa_1jni] Filename found 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-0.sam'
[Java_BwaJni_bwa_1jni] Arg 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-0.sam'
[Java_BwaJni_bwa_1jni] Arg 4 '/home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta'
[Java_BwaJni_bwa_1jni] Arg 5 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1'
[Java_BwaJni_bwa_1jni] Arg 6 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2'
[Java_BwaJni_bwa_1jni] option[0]: bwa.
[Java_BwaJni_bwa_1jni] option[1]: mem.
[Java_BwaJni_bwa_1jni] option[2]: /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta.
[Java_BwaJni_bwa_1jni] option[3]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1.
[Java_BwaJni_bwa_1jni] option[4]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2.
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 6 sequences (300 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 6 reads in 0.002 CPU sec, 0.002 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2
[main] Real time: 0.438 sec; CPU: 9.435 sec
[Java_BwaJni_bwa_1jni] Return code from BWA 0.
[Stage 3:> (0 + 1) / 3][Java_BwaJni_bwa_1jni] Arg 0 'bwa'
[Java_BwaJni_bwa_1jni] Algorithm found 1 'mem'
[Java_BwaJni_bwa_1jni] Arg 1 'mem'
[Java_BwaJni_bwa_1jni] Filename parameter -f found 2 '-f'
[Java_BwaJni_bwa_1jni] Arg 2 '-f'
[Java_BwaJni_bwa_1jni] Filename found 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-1.sam'
[Java_BwaJni_bwa_1jni] Arg 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-1.sam'
[Java_BwaJni_bwa_1jni] Arg 4 '/home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta'
[Java_BwaJni_bwa_1jni] Arg 5 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1'
[Java_BwaJni_bwa_1jni] Arg 6 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2'
[Java_BwaJni_bwa_1jni] option[0]: bwa.
[Java_BwaJni_bwa_1jni] option[1]: mem.
[Java_BwaJni_bwa_1jni] option[2]: /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta.
[Java_BwaJni_bwa_1jni] option[3]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1.
[Java_BwaJni_bwa_1jni] option[4]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2.
[Stage 3:===================> (1 + 1) / 3][M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 8 sequences (400 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 4, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 8 reads in 0.002 CPU sec, 0.001 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2
[main] Real time: 0.440 sec; CPU: 10.097 sec
[Java_BwaJni_bwa_1jni] Return code from BWA 0.
[Java_BwaJni_bwa_1jni] Arg 0 'bwa'
[Java_BwaJni_bwa_1jni] Algorithm found 1 'mem'
[Java_BwaJni_bwa_1jni] Arg 1 'mem'
[Java_BwaJni_bwa_1jni] Filename parameter -f found 2 '-f'
[Java_BwaJni_bwa_1jni] Arg 2 '-f'
[Java_BwaJni_bwa_1jni] Filename found 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-2.sam'
[Java_BwaJni_bwa_1jni] Arg 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-2.sam'
[Java_BwaJni_bwa_1jni] Arg 4 '/home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta'
[Java_BwaJni_bwa_1jni] Arg 5 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1'
[Java_BwaJni_bwa_1jni] Arg 6 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2'
[Java_BwaJni_bwa_1jni] option[0]: bwa.
[Java_BwaJni_bwa_1jni] option[1]: mem.
[Java_BwaJni_bwa_1jni] option[2]: /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta.
[Java_BwaJni_bwa_1jni] option[3]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1.
[Java_BwaJni_bwa_1jni] option[4]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2.
[Stage 3:=======================================> (2 + 1) / 3][M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 6 sequences (300 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 1, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 6 reads in 0.002 CPU sec, 0.002 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2
[main] Real time: 0.429 sec; CPU: 10.584 sec
[Java_BwaJni_bwa_1jni] Return code from BWA 0.
5.数据源:
pair1:
@chr1_114727112_114727587_0:0:0_0:0:0_0/1
AGTACTTGAACTGTGCTAGATCATACACCAAATTATCCTGCATTGTTAAG
+
22222222222222222222222222222222222222222222222222
@chr1_59526946_59527392_0:0:0_1:0:0_1/1
ACTGAAGGTAGAGATGCAGGAATACAGTTACCTGTGCAACTATGACTCTA
+
22222222222222222222222222222222222222222222222222
@chr1_21862138_21862675_1:0:0_2:0:0_2/1
GCCCCAGCCATTAGGCCAAATTTACCAGAAGCCTTTCAGGGTTGCAATCC
+
22222222222222222222222222222222222222222222222222
@chr1_192732894_192733437_1:0:0_1:0:0_3/1
AATAGACAACACGAAGAACAGCTGTGAGCAATACAATTAGAAACTTTTTT
+
22222222222222222222222222222222222222222222222222
@chr1_246769496_246770014_0:0:0_2:0:0_4/1
TAATCTGTACGACAAACCCCCATGTCACTTTACCTCTATAACAAACCTGG
+
22222222222222222222222222222222222222222222222222
@chr1_89997487_89998004_0:0:0_2:0:0_5/1
CCTCAGCCTCCCTAGTAGCTGGGACTACAGGCACGCACCACCAGGCCCGG
+
22222222222222222222222222222222222222222222222222
@chr1_100741557_100742038_2:0:0_1:0:0_6/1
GTTCTCTTATATATTCTGAATAGACATTCTTTATGGAAAATACATTTAGC
+
22222222222222222222222222222222222222222222222222
@chr1_6197792_6198312_2:0:0_0:0:0_7/1
TTCAGTCAGTCGGAAAAACAAGATTAACATAACCAGAAACGTCCTAGGTA
+
22222222222222222222222222222222222222222222222222
@chr1_216218212_216218601_0:0:0_3:0:0_8/1
AATATAGTAAGATAACTTTAGTGCAACTTAAATTTCTTGGACCCAAAGTT
+
22222222222222222222222222222222222222222222222222
@chr1_2670698_2671257_2:0:0_0:0:0_9/1
ACCCACACGCCCATGTGAGCCTCTGACAGCCTGGAACAGCACGCGCAAGC
+
22222222222222222222222222222222222222222222222222
paird2:
@chr1_114727112_114727587_0:0:0_0:0:0_0/2
CAGAATGAAAACAATCTCAAGAACAAAAACCAATAAAAACAACTATAGTT
+
22222222222222222222222222222222222222222222222222
@chr1_59526946_59527392_0:0:0_1:0:0_1/2
CATTGTAAGCACTCAACAAGTGTTAGCTACTCCCAGTTGGAAGCTAGAAT
+
22222222222222222222222222222222222222222222222222
@chr1_21862138_21862675_1:0:0_2:0:0_2/2
GATGGATGTAGTTTTTAATTTATCCAATTGCCTATTCATGGATGTTTAGG
+
22222222222222222222222222222222222222222222222222
@chr1_192732894_192733437_1:0:0_1:0:0_3/2
AATAAAATATCATAGGACTAGGAGGCTTAAACAACATTTATTCCTCGCAG
+
22222222222222222222222222222222222222222222222222
@chr1_246769496_246770014_0:0:0_2:0:0_4/2
ACTTGGATTAGTGCCTGGCACATGGTGTAAGCACTTACTAAGTTTCAACG
+
22222222222222222222222222222222222222222222222222
@chr1_89997487_89998004_0:0:0_2:0:0_5/2
ACCCAATTATCTGCAAAACTGAGCATATTTAAAACAAATAATTACCAATA
+
22222222222222222222222222222222222222222222222222
@chr1_100741557_100742038_2:0:0_1:0:0_6/2
AAATGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCA
+
22222222222222222222222222222222222222222222222222
@chr1_6197792_6198312_2:0:0_0:0:0_7/2
GGACAGGCACCTAGAATACTTTGCAACTCCCTTCATGAGAAGGTGGATGA
+
22222222222222222222222222222222222222222222222222
@chr1_216218212_216218601_0:0:0_3:0:0_8/2
TCATTTATCAGCCTTTTTAAGGGTGTTGAATGATTACCCAAGAGATCAGA
+
22222222222222222222222222222222222222222222222222
@chr1_2670698_2671257_2:0:0_0:0:0_9/2
TGTTACAGGCTGTCAGAGGCTCACCTGGGCATGTGGGTGCTGTTCCAGTC
+
22222222222222222222222222222222222222222222222222
6.部分数据结果:
@SQ SN:chr1 LN:248956422
@PG ID:bwa PN:bwa VN:0.7.12-r1044 CL:bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2
chr1_100741557_100742038_2:0:0_1:0:0_6 65 chr1 100741557 60 50M = 35207765 -65533793 GTTCTCTTATATATTCTGAATAGACATTCTTTATGGAAAATACATTTAGC 22222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:24A13T11 AS:i:40 XS:i:0
chr1_100741557_100742038_2:0:0_1:0:0_6 129 chr1 35207765 0 50M = 100741557 65533793 AAATGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCA 22222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:2T47 AS:i:47 XS:i:47
chr1_114727112_114727587_0:0:0_0:0:0_0 81 chr1 114727538 60 50M = 114727112 -476 CTTAACAATGCAGGATAATTTGGTGTATGATCTAGCACAGTTCAAGTACT 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:0
chr1_114727112_114727587_0:0:0_0:0:0_0 161 chr1 114727112 60 50M = 114727538 476 CAGAATGAAAACAATCTCAAGAACAAAAACCAATAAAAACAACTATAGTT 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:0
chr1_192732894_192733437_1:0:0_1:0:0_3 81 chr1 192733388 60 50M = 192732894 -544 AAAAAAGTTTCTAATTGTATTGCTCACAGCTGTTCTTCGTGTTGTCTATT 22222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:21T28 AS:i:45 XS:i:0
chr1_192732894_192733437_1:0:0_1:0:0_3 161 chr1 192732894 60 50M = 192733388 544 AATAAAATATCATAGGACTAGGAGGCTTAAACAACATTTATTCCTCGCAG 22222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:22T27 AS:i:45 XS:i:0
第二部分:
@SQ SN:chr1 LN:248956422
@PG ID:bwa PN:bwa VN:0.7.12-r1044 CL:bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2
chr1_216218212_216218601_0:0:0_3:0:0_8 97 chr1 216218212 60 50M = 216218552 387 AATATAGTAAGATAACTTTAGTGCAACTTAAATTTCTTGGACCCAAAGTT 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:0
chr1_216218212_216218601_0:0:0_3:0:0_8 145 chr1 216218552 60 47M3S = 216218212 -387 TCTGATCTCTTGGGTAATCATTCAACACCCTTAAAAAGGCTGATAAATGA 22222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:1G45 AS:i:45 XS:i:0
chr1_6197792_6198312_2:0:0_0:0:0_7 97 chr1 6197792 60 50M = 6198263 521 TTCAGTCAGTCGGAAAAACAAGATTAACATAACCAGAAACGTCCTAGGTA 22222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:29G13A6 AS:i:40 XS:i:0
chr1_6197792_6198312_2:0:0_0:0:0_7 145 chr1 6198263 60 50M = 6197792 -521 TCATCCACCTTCTCATGAAGGGAGTTGCAAAGTATTCTAGGTGCCTGTCC 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:0
chr1_59526946_59527392_0:0:0_1:0:0_1 81 chr1 59527343 60 50M = 59526946 -447 TAGAGTCATAGTTGCACAGGTAACTGTATTCCTGCATCTCTACCTTCAGT 22222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:20A29 AS:i:45 XS:i:0
chr1_59526946_59527392_0:0:0_1:0:0_1 161 chr1 59526946 60 50M = 59527343 447 CATTGTAAGCACTCAACAAGTGTTAGCTACTCCCAGTTGGAAGCTAGAAT 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:19
chr1_246769496_246770014_0:0:0_2:0:0_4 81 chr1 246769965 51 50M = 246769496 -519 CCAGGTTTGTTATAGAGGTAAAGTGACATGGGGGTTTGTCGTACAGATTA 22222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:0G13T35 AS:i:44 XS:i:28
chr1_246769496_246770014_0:0:0_2:0:0_4 161 chr1 246769496 60 50M = 246769965 519 ACTTGGATTAGTGCCTGGCACATGGTGTAAGCACTTACTAAGTTTCAACG 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:19
第三部分:
@SQ SN:chr1 LN:248956422
@PG ID:bwa PN:bwa VN:0.7.12-r1044 CL:bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2
chr1_89997487_89998004_0:0:0_2:0:0_5 97 chr1 89997487 22 50M = 89997955 518 CCTCAGCCTCCCTAGTAGCTGGGACTACAGGCACGCACCACCAGGCCCGG 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:43 XA:Z:chr1,-26514021,50M,2;
chr1_89997487_89998004_0:0:0_2:0:0_5 145 chr1 89997955 60 50M = 89997487 -518 TATTGGTAATTATTTGTTTTAAATATGCTCAGTTTTGCAGATAATTGGGT 22222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:22C3T23 AS:i:40 XS:i:19
chr1_2670698_2671257_2:0:0_0:0:0_9 97 chr1 2679088 0 50M = 2666592 -12448 ACCCACACGCCCATGTGAGCCTCTGACAGCCTGGAACAGCACGCGCAAGC 22222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:13G34C1 AS:i:43 XS:i:43
chr1_2670698_2671257_2:0:0_0:0:0_9 145 chr1 2666592 0 50M = 2679088 12448 GACTGGAACAGCACCCACATGCCCAGGTGAGCCTCTGACAGCCTGTAACA 22222222222222222222222222222222222222222222222222 NM:i:0 MD:Z:50 AS:i:50 XS:i:50
chr1_21862138_21862675_1:0:0_2:0:0_2 97 chr1 21862138 60 50M = 21862626 538 GCCCCAGCCATTAGGCCAAATTTACCAGAAGCCTTTCAGGGTTGCAATCC 22222222222222222222222222222222222222222222222222 NM:i:1 MD:Z:2A47 AS:i:47 XS:i:0
chr1_21862138_21862675_1:0:0_2:0:0_2 145 chr1 21862626 60 50M = 21862138 -538 CCTAAACATCCATGAATAGGCAATTGGATAAATTAAAAACTACATCCATC 22222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:20G18G10 AS:i:40 XS:i:0
附录
错误:
错误【5】:
libbwa.so: undefined symbol: gzdopen
错误【6】
java.lang.UnsatisfiedLinkError: no bwa in java.library.path
参考
【1】https://github.com/xubo245/AdamLearning
【2】https://github.com/bigdatagenomics/adam/
【3】https://github.com/xubo245/SparkLearning
【4】http://spark.apache.org
【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job
【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver
研究成果:
【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)
【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).
【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).
【4】more: https://github.com/xubo245/Publications
Help
If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cn
Wechat: xu601450868
QQ: 601450868