lucene vs zoie

转自:http://www.cnblogs.com/nanpo/archive/2012/10/19/2731713.html


前段时间使用zoie的perf包内的性能测试代码对lucene和zoie的实时搜索部分做了对比测试,结果出乎我意料,从数据上看,lucene比zoie更适合于一般实时搜索的场景。

zoie的perf从四个方面来评测:searchlancenty, indexing lancenty, indexing event rate, indexing event size。图1为zoie的评测结果,图2为lucene nrt的评测结果。



Zoie Perf Console 2012-10-09 17-32-50

图1 zoie测试数据

Zoie Perf Console 2012-10-09 17-34-29

图2 lucene nrt 测试数据





从数据上很容易看出,lucene在搜索响应时间上胜出,而zoie在索引数据时有更好的表现。MikeMcCandless在他的一篇博客Lucene's near-real-time search is fast!后的评论回复中解释了nrt和zoie的差别:“

The biggest difference is that Zoie aimsfor immediate consistency 
(reopen after every index change & next query), which I think very few 
apps really require, given how fast NRT is. 
Also, NRTCachingDir (caching small segments in RAM) achieves the 
biggest (in my opinion) benefit of Zoie, but with substantially less 
added complexity. Reducing complexity is important because it means 
less risk of bugs; for example, Zoie had some scary corruption bugs, 
which took quite some time to track down; see 
https://issues.apache.org/jira/browse/LUCENE-2729 
The other part of Zoie I remember is deferring resolving deletions to 
Lucene docIDs, and instead using a bloom filter to post-filter 
collected documents. While I understand the motivation for this 
("immediate consistency") I think it's the wrong tradeoff since it 
necessarily slows down all searching (checking a bloom filter is more 
costly than Lucene's checking a bit set), not to mention the added RAM 
required for the bloom filter. 
Ie, it's better to spend more time during reopen to resolve the 
deletions, so that searches don't slow down.

总的来说就是zoie的强一致性,推迟删除的特性导致了搜索响应时间比lucene长,而且zoie的特殊设计增加了代码的复杂性,bug难于追踪,而且对使用者来说,文档缺乏且阅读代码费时费力,我猜这也是它没能流行起来的原因之一。类似linkedin这样的频繁更新数据的搜索场景很少见,更一般的情况,lucene nrt足以胜任,所以真心觉得cntv和网易大可不用zoie……

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值