Solr: Sharding and replication

Solr allows you to create multiple search indexes, each of which is represented by a Solr core. It is possible to partition your content across multiple Solr indexes (called sharding), as well as to create multiple copies of any partition of the data (called replication).

 

Choosing to shard

Sharding can be useful if you have too many documents to comfortably handle on a single server.



 

 

The number of shards has nothing to do with fault tolerance. It is strictly to help scale as the size of your collection of documents grows.In general, there are five primary factors you need to consider when decid-
ing on how many shards you need:

  1. Total number of documents
  2. Document size
  3. Required indexing throughput
  4. Query complexity
  5. Expected growth

 

Choosing to replicate

If your Solr cluster can handle 100 queries per second but your application needs to support 150 queries per second, you have a problem. Rather than breaking your index into additional partitions (adding shards), you would want to create multiple identical copies of your index and load balance traffic across each of the copies.



 Master server’s solrconfig.xml

 

(http://masterserver:8983/solr/core1)
<requestHandler name="/replication" class="solr.ReplicationHandler">
<lst name="master">
<str name="enable">true</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">optimize</str>
<str name="replicateAfter">startup</str>
</lst>
</requestHandler>

 

 

Slave server’s solrconfig.xml

(http://slaveserver:8983/solr/core1)
<requestHandler name="/replication" class="solr.ReplicationHandler">
<lst name="slave">
<str name="enable">true</str>
<str name="masterUrl">
http://masterserver:8983/solr/core1/replication
</str>
<str name="pollInterval">00:00:15</str>
</lst>
</requestHandler>

 

COMBINING SHARDING AND REPLICATION

At this point you know how to scale Solr to handle either more content (by sharding) or more query load (by replicating). If you are lucky enough to have both a large dataset and a large number of users trying to query your data, however, you may need to set up a cluster utilizing both sharding and replication. If you often have a large amount of indexing going, you may also want to separate your indexing operation and your query operation onto separate servers.



 

 

As you can tell from figure 12.5, setting up a Solr cluster to handle both sharding and replication can quickly become a maintenance nightmare. Querying load balancing between multiple manually defined Solr cores and ensuring replication is configured and enabled between each Solr core on the slave servers and the associated Solr core on the master server can become complex quickly. If you ever have a failure in one of your nodes, it can cause multiple nodes in the cluster to fail. If the single master server in figure 12.5 fails, for example, the entire cluster will stop receiving updates. Likewise, if one slave fails, any other slaves trying to run a distributed search dependent upon the failed slave will also fail their queries.


Thankfully, SolrCloud was created to take over management of these kinds of complexities for you.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值