LoRDEC 0.5 - README file

转载 2015年07月07日 11:00:28


LoRDEC 0.5 - README file

1 Overview

Program for correcting sequencing errors in PacBio reads using highly accurate short reads (e.g. Illumina).

2 Reference

L. Salmela, and E. Rivals. LoRDEC: accurate and efficient long read error correction. Bioinformatics 30(24):3506-3514, 2014.

Access: http://bioinformatics.oxfordjournals.org/content/30/24/3506

3 System Requirements

LoRDEC has been tested on Linux. Compiling the program requires gcc version 4.5 or newer, Boost C++ libraries (e.g. libboost1.48-dev package or newer), and GATB Core library.

4 Installation

  1. Unpack LoRDEC-0.5.tar.gz.
  2. cd LoRDEC-0.5
  3. Download the GATB Core library from http://gatb-core.gforge.inria.fr/. LoRDEC has been tested with GATB Core 1.0.6. Either download the binary version or follow the instructions to build the GATB Core library from the sources.
    • for linux systems type: make installdep
  4. Modify the GATB variable in Makefile in the LoRDEC-0.5 directory to point to your installation of GATB Core library.
  5. Run make in directory LoRDEC-0.5.

5 Usage

5.1 Error correction:

lordec-correct [parameters]

Required parameters:
-2, –shortreads=<short read FASTA/Q files or prebuilt DBG file without .h5 extension>
-i, –longreads=<long read FASTA/Q file>
-k, –kmerlen=<k-mer size>
-o, –correctedreadfile=<output file for corrected long reads>
-s, –solidthreshold=<solidity abundance threshold for k-mers>

Optional parameters:
-t, –trials=<number of target k-mers> Default: 5
-b, –branch=<maximum number of branches to explore> Default: 200
-e, –errorrate=<maximum error rate> Default: 0.4
-T, –threads=<number of threads> Default: use all cores
-S, –statfile=<path statistics file> Default: not generated

The input FASTA/Q files can be compressed. Several Illumina files can be specified as a comma seprated list (e.g. reads1.fa,reads2.fq,reads3.fq.gz).

LoRDEC outputs the corrected reads to the given file in FASTA format. The regions that remain weak after the correction are outputted in lower case characters and the solid regions are outputted in upper case characters.

5.2 Trimming corrected reads

To trim the weak regions from the beginning and end of the corrected reads:

lordec-trim -i <corrected reads file> -o <trimmed reads file>

To trim all weak regions and split the reads on inner weak regions:

lordec-trim-split -i <corrected reads file> -o <trimmed reads file>

The read names of the trimmed and split reads consists of two parts separated by an underscore. The first part is the name of the original read and the second part is a running index of the extracted solid regions from that read.

5.3 Statistics:

To generate statistics on solid and weak k-mers:

lordec-stat -2 <Short read FASTA/Q file> -k <k-mer size> -s <solid k-mer threshold> -i <PacBio FASTA/Q file> -S <output stat file> [-T <number of threads>]

The format of the output statistics file is as follows. There is one line for each read with the following information:

  1. nb of solid k-mers in the read
  2. nb of k-mers in the read
  3. nb of starting weak k-mers i.e. length of the weak head (-1 if no solid k-mers at all)
  4. nb of weak k-mers in the tail i.e. length of the weak tail
  5. list of the lengths of solid k-mers runs

5.4 Statistics on paths

LoRDEC can generate statistics on the explored paths while correcting reads. To turn on the path statistics run LoRDEC with an additional parameter, -s, –statfile=<path statistics file>.

Be warned that the path statistics file can be huge when running LoRDEC on large data sets. The format of the file is as follows. The lines with format solid[i]=<position> tell the position of the source solid k-mer. If running LoRDEC with only one thread the following lines will be for paths with that k-mer as source. If more threads are used, the lines are interleaved in a random fashion. For each path a line with 5 fields is outputted:

  1. expected path length as the difference between the source and target k-mer in the read
  2. status of path search:
    0: path was found and the source and tarket k-mers do not belong to the same run of solid k-mers
    1: path was found and the source and target k-mers belong to the same run of solid k-mers
    2: the expected path length is too long, skipped
    3: the search branched too much, stop;
    4: no path was found
  3. length of the found path
  4. edit distance between the path and the weak region in the read
  5. type of path searched for
    END2END: from one kmer to another
    TAIL: head or tail of read
    GAPEXTEND: extension of gap up to half its length

5.5 Build and save the de Bruijn Graph

To correct long reads or to generate k-mer statistics, LoRDEC builds a de Bruijn Graph from the short reads file. This program allows to build and save the graph in a file before doing such analyses, and then to load the graph file instead of computing it from the short read file. This saves time if you reuse the graph several times. The graph is saved in Hierarchical Data Format (HDF5: version 5).

lordec-build-SR-graph [-T <number of threads>] -2 <FASTA file> -k <k-mer size> -s <solid k-mer threshold> -g <out graph file

reads the <FASTA file> of short reads, then builds and save their de Bruijn graph for k-mers of length <k-mer size> and occurring at least <solid k-mer threshold> time

6 Examples

Below, we provide simple examples of command lines for running the programs of this package.

6.1 Error correction

lordec-correct -2 illumina.fasta -k 19 -s 3 -i pacbio.fasta -o pacbio-corrected.fasta

  • Error correction with several short read files

    • One PacBio file: pacbio-mini.fa
    • Two Illumina files: ill-test-5K-1.fa and ill-test-5K-2.fa
    • One file named meta-file.txt which contains two lines like


    • command for correcting the PacBio file using the two files of Illumina reads:

    lordec-correct -2 meta-file.txt -k 19 -s 3 -i pacbio-mini.fa -o my-corrected-pacbio-reads.fa &> mylog.log

    • the "&> mylog.log" at the end, redirect the standard error to a log file and avoids long message to appear on the screen.

6.2 Trimming

lordec-trim -i pacbio-corrected.fasta -o pacbio-corrected-trim.fasta

lordec-trim-split -i pacbio-corrected.fasta -o pacbio-corrected-trim-split.fasta

6.3 Statistics

lordec-stat -2 illumina.fasta -k 19 -s 3 -i pacbio-corrected.fasta -S pacbio-corrected-stats.txt

6.4 Graph building

lordec-build-SR-graph -2 illumina.fasta -k 19 -s 3 -g illumina-19-3.h5

7 Changes

7.1 Version 0.5

  1. LoRDEC works with the last version of GATB-core (gatb-core-1.0.6-Linux): adaptation to the last interface of GATB for building the graph and reading files of sequences.
  2. Commands lordec-correct, lordec-stat, and lordec-build-SR-graph accepts as input for short reads a file which contains a list of filenames containing reads.
  3. A target named "installdep" was added to the Makefile to install the last GATB-core version before compiling LoRDEC.

7.2 Version 0.4.1

Fixed a bug which can cause overflow of stack allocated memory.

7.3 Version 0.4

Allowing multiple Illumina files: Multiple short read files can now be given as a comma-separated list.

By default GATB 1.0.5 is used. If you wish to link against older GATB use the compiler flag -DOLDGATB.

7.4 Version 0.3

Options have changed and they are now parsed with getopt.

Generating path statistics no longer requires recompiling.

Maximum read length increased to 500000.

Clarfied usage for prebuilding DBG.

7.5 Version 0.2

The code is compatible with GATB Core 1.0.4.

Date: 2015-03-10 17:07:55 CET

Author: Leena Salmela (leena.salmela@cs.helsinki), Eric Rivals (rivals@lirmm.fr)

Org version 7.8.02 with Emacs version 23

Validate XHTML 1.0

LoRDEC 0.5 - README file

 http://atgc.lirmm.fr/lordec/README.html LoRDEC 0.5 - README file 1 Overview Program...
  • qq_29613003
  • qq_29613003
  • 2015年07月07日 11:00
  • 223

LoRDEC: a hybrid error correction program for long, PacBio reads

 http://www.atgc-montpellier.fr/lordec/ L. Salmela and E. Rivals LoRDEC: accurate and effi...
  • qq_29613003
  • qq_29613003
  • 2015年07月07日 11:02
  • 198

my .vim readme file

Usage git clone https://github.com/54shady/dotvim.git .vim ln -s ~/.vim/vimrc ~/.vimrc lookupfile查找但...
  • M_O_Bz
  • M_O_Bz
  • 2016年10月28日 14:33
  • 256

如何写好 README (开发文档书写规范)

作者:园长 链接:https://zhuanlan.zhihu.com/p/22900142 来源:知乎 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。 简评:因为 READM...
  • u011001084
  • u011001084
  • 2016年10月13日 09:05
  • 3926

C# 调用window api大全

Win32API.cs 源码下载 http://files.cnblogs.com/lordeo/win32api.rar using System; using System.Dra...
  • zh_geo
  • zh_geo
  • 2015年08月19日 11:49
  • 2216


http://www.cnblogs.com/jxsoft/archive/2012/06/29/2569876.html   整理的Win32 API,可以直接在C#中直接调用,在做WinFor...
  • jiangxinyu
  • jiangxinyu
  • 2012年10月22日 15:36
  • 7506


原文地址:http://guoyunsky.iteye.com/blog/1781885 大标题   ===================================   ...
  • yuqingzhude
  • yuqingzhude
  • 2016年10月19日 13:42
  • 653


一个初学者如何可以在浩瀚的linux海洋里找到自己的小舟,遨游其中。我就是在寻觅的其中一员,希望你也是。接触linux内核,学习源代码,学习大师 们的杰作,这就是我的原动力,网上太多如何学习linu...
  • cug_fish_2009
  • cug_fish_2009
  • 2011年01月02日 12:28
  • 3470


github上面如何编辑README                  今天把自己做好的《手机摇一摇震动刷新(支持ListView、GridView、Webview)》上传到github,之前代码都...
  • u011068702
  • u011068702
  • 2015年11月09日 17:04
  • 6485


摘自:http://blog.csdn.net/guodongxiaren/article/details/23690801 开始编辑README 打开你的GitHub的某个项目,...
  • liu537192
  • liu537192
  • 2015年05月13日 15:14
  • 1460
您举报文章:LoRDEC 0.5 - README file