Solr scalability improvements

http://yonik.wordpress.com/

With CPU cores constantly increasing, there has been some major work done in Lucene/Solr to increase the scalability under multi-threaded load.
Read-only IndexReaders

One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader. Since another thread could delete a document at any time, the IndexReader.isDeleted() call was synchronized. It’s a very quick call, simply checking if a bit is set in a BitVector, but the problem was that it can be called millions of times in the process of satisfying a single query. The Read-only IndexReader feature allowed for the removal of this synchronization by prohibiting deletion.
Use of NIO to read index files

The standard method for Lucene to read index files is via Java’s RandomAccessFile. Reading a part of the file involves two calls, a seek() to position the file pointer followed by a read() to get the data. For multiple threads to share the same RandomAccessFile instance, this obviously involves synchronization to avoid one thread changing the file pointer before another thread gets to read at the file position it set. If the data to be read isn’t in the operating system cache, it’s even worse news… the synchronization causes all other reads to block while the data is retrieved from disk, even if some of those reads could have been quickly satisified.

The preferred solution would be to have a method on RandomAccessFile that accepted an offset to read from. This could easily be implemented by the JVM via a pread() system call. But since Sun has not provided this functionality, we need to use something else. NIO’s FileChannel does have the type of method we are looking for: FileChannel.read(ByteBuffer dst, long position)

Solr now uses the non-synchronizing NIO method of reading index files (via Lucene’s NIOFSDirectory) by default if you are on a non-Windows platform. Windows systems default to the older method since it turns out to be faster than the new method - the reason being a long standing “bug” in Java that still synchronizes internally even when using FileChannel.read().
Non blocking caches

Solr’s standard LRU cache implementation use a synchronized LinkedHashMap. A single cache could be checked hundreds or thousands of times during the course of a single request that involves faceting. A non-blocking ConcurrentLRUCache was developed as an alternative implementation, and is now the default for Solr’s filter cache. One user indicated that this has doubled their query throughput under ideal circumstances.
Where to find this scalability goodness?

Solr 1.3 has read-only IndexReaders, but for the other scalability improvements, including the improved faceting, you’ll have to grab a nightly Solr build.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值