Solr in action学习笔记第十三章 SolrCloud

最新推荐文章于 2024-05-21 11:40:29 发布

weixin_30556161

最新推荐文章于 2024-05-21 11:40:29 发布

阅读量77

点赞数

文章标签：大数据运维

原文链接：http://www.cnblogs.com/cjrzh/p/4681497.html

版权

13.1 Getting started with SolrCloud

13.1.1Starting Solr in cloud mode

单机建立一个集群应用，一个端口模拟一个solr

cd $SOLR_INSTALL/
cp -r example/ shard1/

13.1.2 Motivation behind the SolrCloud architecture

■ Scalability
■ High availability
■ Consistency
■ Simplicity
■ Elasticity

----------------------------------------

■ Scalability

*replication可以提高容错性，并且提供query的并行性

我们的目标是linearly scalable，但实际上增加资源要增加额外的管理开销，所以只能接近这个目标

一个Solr的index至多21亿的文档（int64的ID），解决方法是索引分片shard

大文档和多field需要更多的内存和更快的磁盘IO,解决：Add RAM and faster disks

Index吞吐量：需要每秒索引数千文档，解决：分布式索引

query量：使用“复制”并行query

query复杂性（facet，sort等）：使用shard和replication

-----------------------------------------------------------------

■ High availability（高可靠性）

从商业的角度考虑问题：How much you can spend

failover失败备缓

数据冗余：失败时不用复制数据到正常机器

1 Unexpected outages that affect a subset of the nodes in your cluster due to issues
such as hardware faults and loss of network connectivity
2 Planned outages due to upgrades and system maintenance tasks
3 Degraded service due to heavy system load
4 Disasters that take your entire cluster/data center offline

Solr提供单数据中心的高可靠性，多数据中心还未提供支持

服务的两种架构：1.所有的node都提供index和query2.master nodes提供index，slave nodes提供query

minimize downtime during upgrades：rolling restart

另一种outage：过载，query返回过慢，在用户端是不能容许的！

　　解决：可靠的管理系统，快速添加node的能力

高级话题：硬件层优化，如RAID等

--------------------------------------------

■ Consistency

根据CAP原则，可用性与一致性不可兼得？

更新操作必须在所有replicas上成功，否则整个操作失败。solr不允许replicas上的query返回不同版本的文档。

Solr目前对不一致性是0容忍的。

-----------------------------------------------

■ SIMPLICITY

*一但集群启动，操作不比单机复杂

*fail node恢复简单：自动同步

Zookeeper可以看成黑盒技术，处理初始化就不用太管了。

ELASTICITY

扩展系统的能力：shard继续分成更小的shard，增加replica

---------------------------------------------------------------

13.2 Core concepts

13.2.1 Collections vs. cores

Collections提供一个schema的整个服务，可有多个cores组成，每个core是一个shard或replica？。

shard是互不相交的索引分片，replica是shard的复制，一个shard有多个replica，其中一个是leader

13.2.2 ZooKeeper

■ Centralized configuration storage and distribution
■ Detection and notification when the cluster state changes
■ Shard-leader election

成熟稳定广泛应用

ZOOKEEPER DATA MODEL

组织数据为类似于文件系统的分层结构，每层称为znode，包含基本的元数据，每个znode最多存1mb数据。ZooKeeper不是用来做数据存储系统的，只存小的元数据。

一个中心概念：ephemeral znode，短暂的znode？由客户端连接使其保持actvie。如果客户端失去连接，短暂zndoe被自动删除。

一个Solr的node加入集群，Zookeeper会为其创建znode，如果该node失联，Zookeeper还会通知其他node

ZNODE WATCHER

任何客户端应用都可以注册为watcher，znode改变，Zookeeper就会通知watcher

PRODUCTION CONFIGURATION

对于产品来讲，配置一个独立的Zookeeper全体，有3个node组成

zkHost参数将Zookeeper的服务器和端口传给Solr

ZOOKEEPER CLIENT TIMEOUT

Zookeeper检视solr状态的超时参数，默认15秒

CENTRALIZED CONFIGURATION STORAGE AND DISTRIBUTION

solrconfig和schema都被提交到Zookeeper上！

13.2.3 Choosing the number of shards and replicas

有文档数，文档大小，index，query吞吐量，query复杂性，index增长等因素决定。12章Solr产品化有讲

13.2.4 Cluster-state management

active，inactive等

13.2.5 Shard-leader election

shard leader接受更新请求，并发布到replicas上使其同步，Specifically,

■ Accepts update requests for the shard
■ Increments the value of the _version_ field on the updated document and enforces optimistic locking
■ Writes the document to its update log
■ Sends the update (in parallel) to all replicas and blocks until a response is received

shard leader在query时没有额外的责任

13.2.6 Important SolrCloud configuration settings

solr.xml有<solrcould>标签

HOST：向Zookeeper提供ip和端口，产品化时最好使用host name，更可视化，并且易于更新（更新dns

具体425-426

***********************************************************

13.3 Distributed indexing

客户单的角度，index没有改变。服务器端index改变巨大，

13.3.1 Document shard assignment

document router：文档路由，决定文档分配到哪个shard

两个solr提供的策略：compositeId (default) and implicit（不讨论，路由需要客户端编程完成，定制化路由）

每个shard分配32位的hash range，范围平均分配到每个shard

该算法使用unique document ID计算hash，分配到该范围的shard中

计算需要快速且对shard公平。

使用MurmurHash算法

13.3.2 Adding documents

SolrJ提供新的SolrServer实现：CloudSolrServer，是index更鲁棒

CloudSolrServer读取zookeeper的cluster-state，直到shard leader，因为update request要先路由到leader，CloudSolrServer可以直接发给leader节省时间

具体步骤略读P430-431

一批文档CloudSolrServer自动分组，高吞吐量index到正确的shard上

13.3.3 NRT

实际上是soft commit，略

13.3.4 Node recovery

■ Peer sync—If the outage was short-lived and the recovering node missed only a few updates, it will recover by pulling updates from the shard leader’s update log. The upper limit on missed updates is currently hardcoded to 100. If the number of missed updates exceeds this limit, the recovering node pulls a full index snapshot from the shard leader.

■ Snapshot replication—If a node is offline for an extended period of time such that it becomes too far out of sync with the shard leader, it uses Solr’s HTTPbased replication, based on the snapshot of the index.

-----------------------------------------------------------------------------

13.4 Distributed search

转载于:https://www.cnblogs.com/cjrzh/p/4681497.html

weixin_30556161

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Solr in action学习笔记第十三章 SolrCloud

13.1 Getting started with SolrCloud13.1.1Starting Solr in cloud mode单机建立一个集群应用，一个端口模拟一个solrcd $SOLR_INSTALL/cp -r example/ shard1/13.1.2 Motivation behind the SolrCloud architecture■ Sc...
复制链接

扫一扫