[zz]Distributing Lucene Indexing and Searchin

[url]http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html[/url]

A clever way to do this is to take advantage of Lucene's index file structure. Indexes are directories of files. As the index changes through additions and deletions most files in the index stay the same. So you can efficiently synchronize multiple copies of an index by only copying the files that change.

The way I did this for Technorati was to:


1. On the index master, periodically checkpoint the index. Every minute or so the IndexWriter is closed and a 'cp -lr index index.DATE' command is executed from Java, where DATE is the current date and time. This efficiently makes a copy of the index when its in a consistent state by constructing a tree of hard links. If Lucene re-writes any files (e.g., the segments file) a new inode is created and the copy is unchanged.

2. From a crontab on each search slave, periodically poll for new checkpoints. When a new index.DATE is found, use 'cp -lr index index.DATE' to prepare a copy, then use 'rsync -W --delete master:index.DATE index.DATE' to get the incremental index changes. Then atomically install the updated index with a symbolic link (ln -fsn index.DATE index).

3. In Java on the slave, re-open 'index' it when its version changes. This is best done in a separate thread that periodically checks the index version. When it changes, the new version is opened, a few typical queries are performed on it to pre-load Lucene's caches. Then, in a synchronized block, the Searcher variable used in production is updated.

4. In a crontab on the master, periodically remove the oldest checkpoint indexes.

Technorati's Lucene index is updated this way every minute. A mergeFactor of 2 is used on the master in order to minimize the number of segments in production. The master has a hot spare.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值