SolrCloud 简介

最新推荐文章于 2024-07-30 18:29:33 发布

weixin_34248258

最新推荐文章于 2024-07-30 18:29:33 发布

阅读量114

点赞数

文章标签： ui python 大数据

原文链接：https://my.oschina.net/tigerlene/blog/1491977

版权

2019独角兽企业重金招聘Python工程师标准>>>

前言

上一篇中，我们完成了Flume + Solr + log4j搭建web日志采集系统，收集日志并存储在solr中。这一篇，我们会介绍一下solr的一些基本特性，和使用方法，帮助我们更好的使用solr。

###SolrCloud

Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability. Called SolrCloud, these capabilities provide distributed indexing and search capabilities, supporting the following features:

Central configuration for the entire cluster
Automatic load balancing and fail-over for queries
ZooKeeper integration for cluster coordination and configuration.

SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas. Instead, Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas. Queries and updates can be sent to any server. Solr will use the information in the ZooKeeper database to figure out which servers need to handle the request.

SolrCloud 使用zookeeper 来进行solr的集群管理，使用配置，和schemas。

Logical:
A Cluster can host multiple Collections of Solr Documents.
A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.
The number of Shards that a Collection has determines:
The theoretical limit to the number of Documents that Collection can reasonably contain. The amount of parallelization that is possible for an individual search request.

SolrCloud 逻辑组织，一个Cluster -> 多个collection ,
一个collection->多个shards(将数据分片至多台机子上，并行查找)

Physical：
A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
Each Node can host multiple Cores.
Each Core in a Cluster is a physical Replica for a logical Shard.
Every Replica uses the same configuration specified for the Collection that it is a part of.
The number of Replicas that each Shard has determines:
The level of redundancy built into the Collection and how fault tolerant the Cluster can be in the event that some Nodes become unavailable.
The theoretical limit in the number concurrent search requests that can be processed under heavy load.

SolrCloud物理组织，一个Cluster -> 多个Nodes, 一个Node -> 多个Solr Service,
一个Node -> 多个Cores
一个Core -> (a physical Replica for a logical Shard), 意思是，一个逻辑上的colletion只有一个配置，可以对shard，做replica(复制)。

所以关于分片的数量，和复制的数量就和Solr并行查找的效率有关了。这是在创建Colletion的时候需要重点考虑的
SolrCloud 还有一个UI界面，方便大家更直观的查看：Solr Admin UI

####工具solrctl 理解了SolrCloud 之后，我们就用Cloudera Manager提供的工具来管理SolrCloud Managing Solr Using solrctl
solrctl_reference

###配置Schema.xml fields example:

  <field name="level" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="create_time" type="date" indexed="true" stored="true"/>
   <field name="thread" type="string" indexed="true" stored="true"/>
   <field name="class" type="text_general" indexed="true" stored="true"/>
   <field name="message" type="text_general" indexed="true" stored="true"/>

我们知道filed是存储在solr中的字段，这里除了我们看到的indexed,sotrted. 比较重要的就是type字段了，这和我们最终检索有关， Understanding Analyzers, Tokenizers, and Filters

fieldType exmaple:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

选好了fieldType之后，我们的solr就会在存储的时候，做分词相关的工作，这样就可以在检索的时候，检索到自己想要的结果了。如果一开始没做好，后面可以重新修改schema.xml

###搜索 Search doc 关于搜索语法，有很多其他的博客都写得很好，这里不再赘述。

###SolrCloud replica

SOLR No active slice servicing hash code

我这里发生这个异常是因为，我一开始做了三个shard，但是有一个shard，没有启动成功，然后Solr在选择分片的时候使用hash算法，hash，值有找到对应的shard。

org.apache.solr.common.SolrException: No active slice servicing hash code 7b50d0a2 in DocCollection(collection1)={
"shards":{
"shard1":{
  "range":"80000000-d554ffff",
  "state":"active",
  "replicas":{
    "core_node1":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.131:8983_solr",
      "base_url":"http://XX.XXX.XXX.131:8983/solr",
      "leader":"true"},
    "core_node7":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.131:9983_solr",
      "base_url":"http://XX.XXX.XXX.131:9983/solr"}}},
"shard2":{
  "range":"d5550000-2aa9ffff",
  "state":"active",
  "replicas":{
    "core_node5":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.133:8983_solr",
      "base_url":"http://XX.XXX.XXX.133:8983/solr"},
    "core_node8":{
      "state":"active",
      "core":"collection1",
      "node_name":"XX.XXX.XXX.132:8983_solr",
      "base_url":"http://XX.XXX.XXX.132:8983/solr",
      "leader":"true"}}},
"shard3": { ......}

参考：SOLR No active slice servicing hash code

关于分片的处理

###clusterstate.json

https://stackoverflow.com/questions/22143018/query-solr-cluster-for-state-of-nodes
http://codegouge.blogspot.com/2013/08/manually-editing-solrs-clusterstatejson.html

转载于:https://my.oschina.net/tigerlene/blog/1491977