Main steps :
-
Estimate genome size and read length from reads (unless --gsize provided)
-
Reduce FASTQ files to a sensible depth (default --depth 100)
-
Trim adapters from reads (with --trim only)
-
Conservatively correct sequencing errors in reads
-
Pre-overlap ("stitch") paired-end reads
-
Assemble with SPAdes/SKESA/Megahit with modified kmer range and PE + long SE reads
-
Correct minor assembly errors by mapping reads back to contigs
-
Remove contigs that are too short, too low coverage, or pure homopolymers
-
Produce final FASTA with nicer names and parseable annotations
优点:组装前有reads纠错;组装后也有纠错,可以根据contig长度对数据进行去除。
使用脚本:shovill --R1 aaa1.fq --R2 aaa2.fq -- outdir outpath --cpus 16 --depth 0 --minlen 100 --ram 20