Comprehensive evaluation of error correction methods for high-throughput sequencing data

Comprehensive evaluation of error correction methods for high-throughput sequencing data

高通量测序数据误差修正方法的综合评价

4.2.1 Illumina Tools
The input read sets were corrected using the 17 error correction tools that had shown good
accuracy in the previous evaluations or had been newly published at the time of running the
evaluations. Among these, the standalone error correction tools are BFC [5], BLESS [6], Blue [7], Coral
[8], ECHO [9], HiTEC [11], Fiona, Lighter [12], Musket [13], Quake [14], QuorUM [15], RACER [16], Reptile
[17], and Trowel [18]. The remaining three tools are parts of DNA assemblers, ALLPATHS-LG [21], SGA
[22], and SOAPdenovo [23].
For each error correction method, successive numbers were applied to the key parameters of the
tools, and multiple corrected output read sets were generated corresponding to each parameter. The
output read sets were assessed using SPECTACLE and the one that had the highest gain for substitutions,
insertions, and deletions was chosen. The maximum k-mer length for Quake was limited to 18 beyond
which the memory capacity of our server was exhausted.
ALLPATHS-LG, BFC, BLESS, Blue, Musket, Quake, QuorUM, RACER, Reptile, SGA, and SOAPec
succeeded in generating outputs for all the input read sets. Coral, HiTEC, Fiona, and Trowel failed to
correct errors in large genomes because of insufficient memory. ECHO had not finished after 70 hours
for the I4 and I5 read sets. Lighter finished correcting all the read sets but it made no correction for the
read sets with 10 X coverage.

4.2.1 Illumina公司工具

使用17种错误校正工具对输入读集进行校正,这些工具在以前的评估中显示出良好的准确性,或在运行评估时新发布。

其中,独立纠错工具有BFC [5], BLESS [6], Blue [18]0, Coral [18]2, ECHO [18]1, HiTEC [11], Fiona, Lighter [12], Musket [13], Quake [14], QuorUM [15], RACER [16], Reptile[17],和Trowel[18]。

其余三种工具分别是DNA装配器ALLPATHS-LG[21]、SGA[22]和SOAPdenovo[23]。

对于每一种纠错方法,对工具的关键参数进行逐次编号,并根据每个参数生成多个纠错后的输出读集。输出读集使用SPECTACLE进行评估,并选择在替换、插入和删除方面收益最高的那一个。

Quake的最大k-mer长度限制为18,超过这个限制,服务器的内存容量就会耗尽。

ALLPATHS-LG、BFC、BLESS、Blue、Musket、Quake、QuorUM、RACER、Reptile、SGA和SOAPec成功地为所有输入读集生成了输出。Coral、HiTEC、Fiona和Trowel未能纠正大基因组中的错误,因为它们的内存不足。

ECHO在I4和I5读组70小时后仍未完成。

Lighter完成了对所有读集的校正,但是它没有对覆盖10倍的读集进行校正。

 

4.2.2 TGS (PacBio and ONT) Tools
Widely used PacBio read error correction tools LoRDEC [29], LSC [30], PBcR [31], and Proovread[32] were evaluated using P1 and P2. No parameter tuning was needed for LSC, PBcR, and Proovread.
For LoRDEC, multiple output sets were generated by applying successive values for k-mer length and solid k-mer occurrence threshold, and result that gave the highest percentage similarity was chosen.

LSC could not be assessed using P2 because it had not finished after 70 hours.
Since ONT is a relatively newer technology, ONT read error correction technologies are just being explored and studied in detail.

Two of the most recent ONT read error correction technologies NaS [33] and NanoCorr [34] were evaluated using O1 and O2.

 

4.2.2 TGS (PacBio和ONT)工具

广泛使用的PacBio读错误校正工具LoRDEC[29]、LSC[30]、PBcR[31]和Proovread[32]使用P1和P2进行评估。

LSC、PBcR和Proovread不需要进行参数调优。对于LoRDEC,通过对k-mer长度和实体k-mer出现阈值应用连续值生成多个输出集,并选择相似度百分比最高的结果。LSC不能使用P2进行评估,因为它在70小时后还没有完成。

由于ONT是一项相对较新的技术,ONT读纠错技术还处于探索和研究阶段。两种最新的ONT read错误校正技术NaS[33]和NanoCorr[34]使用O1和O2进行了评估。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

wangchuang2017

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值