When to commit

When to commit ?

The question I asked myself recently what seems to be one of those for which the response should be quick and painless. So, when to send the commit command to Solr (or Lucene)? Despite the simplicity of the questions, the answer is not clear, at least in my opinion.

To answer the question of when to send the commit command, you must look at several different variants of data indexing and how quickly you want the data to be available on the slave servers. Looking at a typical implementations, which I had a pleasure to work with we can distinguish the following categories:

Data can be made available only after a total index update

The simplest situation theoretically and practically. We send the commit command only when you run out documents to be indexed.

The data may be available in batches, without waiting for a full update of the index

Here we have three possibilities:

  1. If it does not matter whether the data will be made available in batches or not, we can send the commit command after sending the last document.
  2. If you want to share data in batches, our application can send a commit command from time to time.
  3. If you do not want to send the commit commands from the indexing application, we can tell Solr to do it for us by setting up the autocommit mechanism.

Data must be indexed as fast as possible

If your data should be indexed as fast as possible the commit operation should be sent only after sending all the data. Commit is quite expensive in terms of performance and therefore, in this case, should be used only at the end of the indexation process.

It is important that the data should be published as soon as possible

This is probably the most difficult of the described cases. It all depends on how quickly we want the data to be available on slave servers. For example, in the case of CMS, when the user saves the edited page, we want its updated content to be available right away – then commit after every document, and fast replication is needed.When you add items to an online store, you may add some delay to commit and replication. Such cases can be multiplied indefinitely. But remember to set up your warming queries properly to prepare Solr fot the usual load during querying.
Persons interested in very frequent updating of the index should observe what is happening in Lucene and Solr for NRT (near real time).

Optimization

It is worth remembering also to optimize the index. If we send the commit command only once, at the end of the indexing is worth considering whether or not to send optimize instead of commit. Our slaves will get an optimized version of the index along with the newest data. Note, however, that the optimization of the index is longer than commit.

Dangers

It is also worth remembering that the waiting indefinitely with commit operations can lead to the danger of data loss that have not been physically written to the index files. Of course, nothing with the data does not happen if the Solr will be properly turned off, while in case of machine failure situation we can lost the data tha we were indexing since the last commit operation.

To sum up

As you can see, there is no clear answer to when to send the commit command because it depends on the situation and individual needs. Note, however, that the actions that are performed by Lucene / Solr after sending the commit command is costly in terms of system resources. Do not use this command frequently as instead of indexing data Lucene/Solr may spend most of their time processing those commands.

 

转载:http://solr.pl/en/2011/06/27/when-to-commit/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值