1.下载:
https://github.com/lh3/wgsim
可以git或者zip
2.安装:
gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm
3.数据下载:可以使用bwakit下载:
https://github.com/lh3/bwa/tree/master/bwakit
下载:
bwa.kit/run-gen-ref hs38DH
4.使用方法和默认配置:
hadoop@Master:~/cloud/spark-1.5.2/examples/src/main/resources$ wgsim
Program: wgsim (short read simulator)
Version: 0.3.2
Contact: Heng Li <lh3@sanger.ac.uk>
Usage: wgsim [options] <in.ref.fa> <out.read1.fq> <out.read2.fq>
Options: -e FLOAT base error rate [0.020]
-d INT outer distance between the two ends [500]
-s INT standard deviation [50]
-N INT number of read pairs [1000000]
-1 INT length of the first read [70]
-2 INT length of the second read [70]
-r FLOAT rate of mutations [0.0010]
-R FLOAT fraction of indels [0.15]
-X FLOAT probability an indel is extended [0.30]
-S INT seed for random generator [0, use the current time]
-A FLOAT discard if the fraction of ambiguous bases higher than FLOAT [0.05]
-h haplotype mode
5.使用实践:
(1)默认双端:
wgsim hs38DH.fa PE/hs38DHPE1LallF1.fq PE/hs38DHPE1LallF2.fq
(2)默认匹配
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ wgsim hs38DH.fa hs38DHSELallF1V2.fq /dev/null
(3)-N 产生reads的数量
-N 10000
wgsim -N 1000 hs38DH.fa PE/hs38DHPE1L1000F1.fq PE/hs38DHPE1L1000F2.fq
查看:
文件长度:
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat PE/hs38DHPE1L10000F1.fq |wc -l
39740
fq的格式为一条reads四行信息
文件内容:
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat PE/hs38DHPE1L10000F1.fq |head -20
@chrUn_KN707606v1_decoy_29_523_2:0:0_1:0:0_0/1
ATGCCCAGCTGGTTTCTGATACTTCTAATCAAATGTCTTATCCCCCAAATTAGCCCTGGGAGTGAGAATA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@chrUn_KN707606v1_decoy_657_1222_1:0:0_1:0:0_1/1
GTGGTGCACACCTGTAGTGCCTGTTCCTTGGGAGGCTGAGGCCGGAGGATCCCTTGAGCCCAGGAGTTCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@chrUn_KN707606v1_decoy_1052_1588_2:0:0_1:1:0_2/1
GTCCAAACACCACGTGACAAGCCCATTCTTCCATTTTCTCAGACCATAAACTGCACTGTCCTCTAACTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@chrUn_KN707607v1_decoy_1123_1686_1:0:0_2:0:0_0/1
GAGGATATTTTGTTTAGTCACTAGGATTTCTTAACATTCTGAAATTCTATTCACCTCTGATTTTGTCTAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@chrUn_KN707607v1_decoy_877_1369_0:0:0_0:0:0_1/1
TATAGTTAACATAACATGGTCTATCTTTAGATAATCTCCATGCACAGTAAGATAATATTTTTTCTAGGAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
(4)-1 第一个的reads的长度
-1 10表示第一个位置的fq的reads长为10
wgsim -N10000 -1 10 hs38DH.fa SE/hs38DHSE1N10000L10F1.fq /dev/null
信息查看:
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat SE/hs38DHSE1N10000L10F1.fq |wc -l
39740
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat SE/hs38DHSE1N10000L10F1.fq |head -20
@chrUn_KN707606v1_decoy_216_790_0:0:0_2:0:0_0/1
CATGTCTTTC
+
2222222222
@chrUn_KN707606v1_decoy_1191_1728_0:0:0_1:0:0_1/1
TTAACCTTAA
+
2222222222
@chrUn_KN707606v1_decoy_792_1284_1:0:0_0:0:0_2/1
CAGAACAAAA
+
2222222222
@chrUn_KN707607v1_decoy_1925_2441_0:0:0_1:0:0_0/1
TGCAGGTTTG
+
2222222222
@chrUn_KN707607v1_decoy_2305_2757_1:0:0_3:0:0_1/1
GGACAAGGGA
+
2222222222
6.其他:
(1)匹配:
使用BWA构建索引:
hadoop@Master:~/cloud/adam/xubo/data/wgsim/hs38DH$ ll -h
total 22M
drwxrwxr-x 4 hadoop hadoop 4.0K 4月 15 15:48 ./
drwxrwxr-x 7 hadoop hadoop 4.0K 4月 11 17:10 ../
-rw-rw-r-- 1 hadoop hadoop 8.0M 4月 11 17:08 hs38DH.fa
-rw-r--r-- 1 hadoop hadoop 477K 4月 11 17:08 hs38DH.fa.alt
-rw-rw-r-- 1 hadoop hadoop 15 4月 11 17:10 hs38DH.fa.amb
-rw-rw-r-- 1 hadoop hadoop 365K 4月 11 17:10 hs38DH.fa.ann
-rw-rw-r-- 1 hadoop hadoop 7.6M 4月 11 17:10 hs38DH.fa.bwt
-rw-rw-r-- 1 hadoop hadoop 1.9M 4月 11 17:10 hs38DH.fa.pac
-rw-rw-r-- 1 hadoop hadoop 3.8M 4月 11 17:10 hs38DH.fa.sa
drwxrwxr-x 2 hadoop hadoop 4.0K 4月 15 16:23 PE/
drwxrwxr-x 2 hadoop hadoop 4.0K 4月 15 15:48 SE/
(2)转变成adam
hadoop@Master:~/cloud$ adam-submit fasta2adam /xubo/adam/hs38DH/hs38DH.fa /xubo/adam/hs38DH/adam/hs38DH.adam
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/home/hadoop/cloud/spark-1.5.2//bin/spark-submit
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.