SolrCloud
solrcloud是集成了solr和zookeeper的优秀的分布式搜索引擎,他支持集中式配置文件,自动容错,查询时自动负载均衡等优秀特性。
-
solrcloud的几个概念
-
collection
collection在solrcloud可以看成是对所有document的索引,如果在一个zookeeper节点或者server上面觉得索引太臃肿或者查询耗时过多,可以将collection分成多个shards。 -
shards
shards是document的索引分片,solrcloud自动按你分片的数量来分发索引片断,一个shards又可以分成多个replica和一个leader,这个leader根据某个选举算法选出,当生成索引时,如果document是提交给了replica,那么replica会将请求转给同shrad的leader,然后再将document路由发给replica,如果leader宕机挂掉了,solrcloud会根据选举算法再从replica中选举产生leader。其实replica在某种意义上可以看成是shards的备份,增加了服务器的健壮性,这个replica也有点像solr4.0之前的master-slaver,但是在solrcloud中发生搜索时,replica也是可以分担搜索任务的。当一个shards的文档达到阈值时,shards是可以分裂的,用户将文档随意交给一个replica,replica转交给leader,leader最后交给新分裂的shards的leader,然后leader将文档分发给他的replica。
-
solrcloud5.3.1搭建
-
下载好jdk,zookeeper,solr安装包
jdk-8u45-linux-x64.tar.gz
zookeeper-3.4.6.tar.gz
solr-5.3.1.tgz -
安装jdk,配置好环境变量
[root@localhost soft]# ls
jdk-8u45-linux-x64.tar.gz solr-5.3.1 solr-5.3.1.tgz zookeeper-3.4.6.tar.gz
[root@localhost soft]# cd
[root@localhost soft]# tar xf jdk-8u45-linux-x64.tar.gz
[root@localhost soft]# ls
jdk1.8.0_45 jdk-8u45-linux-x64.tar.gz solr-5.3.1 solr-5.3.1.tgz zookeeper-3.4.6.tar.gz
[root@localhost soft]#
[root@localhost soft]# vim /etc/profile
添加环境变量:export JAVA_HOME=/home/workspace/soft/jdk1.8.0_45
export PATH=$PATH:$JAVA_HOME/bin
[root@localhost soft]# source /etc/profile //使配置文件生效
3.安装zookeeper集群
[root@localhost soft]# tar xf zookeeper-3.4.6.tar.gz
[root@localhost soft]# mv zookeeper-3.4.6 /home/workspace/
[root@localhost workspace]# ls
soft solrcloud zookeeper
[root@localhost workspace]# cd zookeeper/
[root@localhost zookeeper]# ls
data zookeeper-3.4.6
[root@localhost zookeeper]# cd zookeeper-3.4.6/
[root@localhost zookeeper-3.4.6]# ls
bin contrib ivy.xml README_packaging.txt zookeeper-3.4.6.jar
build.xml dist-maven lib README.txt zookeeper-3.4.6.jar.asc
CHANGES.txt docs LICENSE.txt recipes zookeeper-3.4.6.jar.md5
conf ivysettings.xml NOTICE.txt src zookeeper-3.4.6.jar.sha1
[root@localhost zookeeper-3.4.6]# cd conf
[root@localhost conf]# ls
configuration.xsl log4j.properties zoo.cfg zoo_sample.cfg
[root@localhost conf]# cp zoo_sample.cfg zoo.cfg
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/workspace/zookeeper/data //data数据存放路径
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
//添加三台服务器
server.3=192.168.219.131:2888:3888
server.2=192.168.219.130:2888:3888
server.1=192.168.219.132:2888:3888
以上代码需要在三台服务器上分别配置
接下来再在不同的服务器上进入之前配置好的dataDir的data目录上新建myid文件写入服务器标识
比如如果是服务器3 server.3就需要在服务器三上的data目录下如下操作:
[root@localhost data]# echo "3" >myid
分别在三台服务器上写入myid
然后分别启动三台服务器
[root@localhost zookeeper-3.4.6]# cd bin
[root@localhost bin]# ./zkServer.sh start
测试集群搭建是否成功:
[root@localhost zookeeper-3.4.6]# bin/zkServer.sh status
JMX enabled by default
Using config: /home/workspace/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
[root@localhost zookeeper-3.4.6]#
如果状态是如上所示那么说明搭建zookeeper集群成功,有可能有同学的mode属性是stardalone,这是默认单机,并不是集群,需要检查之前的配置文件是否配置错误。还有如果会报no route to host 错有可能是因为防火墙的问题,通过命令chkconfig iptables off来关闭防火墙
4.安装solrcloud服务
到workspace的目录下,建立solrcloud/{data,solr}data和solr两个文件夹
[root@localhost workspace]# mkdir -p /home/workspace/solrcloud/{data,solr} //-p是如果父目录如果没有就创建的意思
//接着到soft文件夹下解压solr并且安装服务
[root@localhost soft]# tar xf solr-5.3.1.tgz
[root@localhost soft]# ls
jdk1.8.0_45 jdk-8u45-linux-x64.tar.gz solr-5.3.1 solr-5.3.1.tgz zookeeper-3.4.6.tar.gz
[root@localhost soft]# cd solr-5.3.1/bin
[root@localhost bin]# ./install_solr_service.sh /home/workspace/soft/solr-5.3.1.tgz -d /home/workspace/solrcloud/data/ -i /home/workspace/solrcloud/solr/ -s solrcloud -u root -p 8080
//这里面的-d表示将前面的solr_service的数据放到当前目录,-i是将solr解压到当前目录,-s是服务名称,-p是设置端口号,默认8983
[root@localhost workspace]# cd solrcloud/
[root@localhost solrcloud]# ls
data solr
[root@localhost solrcloud]# cd solr
[root@localhost solr]# cd ..
[root@localhost solrcloud]# ls -al
total 16
drwxr-xr-x. 4 root root 4096 Oct 30 08:09 .
drwxr-xr-x. 5 root root 4096 Oct 30 08:09 ..
drwxr-xr-x. 4 root root 4096 Oct 30 09:26 data
drwxr-xr-x. 3 root root 4096 Oct 30 08:13 solr
[root@localhost solrcloud]# cd solr
[root@localhost solr]# ls -al
total 12
drwxr-xr-x. 3 root root 4096 Oct 30 08:13 .
drwxr-xr-x. 4 root root 4096 Oct 30 08:09 ..
drwxr-xr-x. 9 root root 4096 Oct 30 08:13 solr-5.3.1
lrwxrwxrwx. 1 root root 42 Oct 30 08:13 solrcloud -> /home/workspace/solrcloud/solr//solr-5.3.1
[root@localhost solr]# ls ../data
data log4j.properties logs solr-8080.pid solr.in.sh
[root@localhost solr]# vim solr.in.sh
[root@localhost solr]# cd ..
[root@localhost solrcloud]# ls
data solr
[root@localhost solrcloud]# cd data
[root@localhost data]# ls
data log4j.properties logs solr-8080.pid solr.in.sh
[root@localhost data]# vim solr.in.sh
//修改zk属性
# Set the ZooKeeper connection string if using an external ZooKeeper ensemble
# e.g. host1:2181,host2:2181/chroot
# Leave empty if not using SolrCloud
#ZK_HOST=""
ZK_HOST="192.168.219.132:2181,192.168.219.130:2181,192.168.219.131:2181"
在每台服务器上都修改该属性
然后重启服务
[root@localhost data]# service solrcloud restart
最后创建collection:
[root@localhost solr-5.3.1]# ./server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig -confname demo-conf -confdir server/solr/configsets/basic_configs/conf/
[root@localhost solr-5.3.1]# ./server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd linkconfig -collection demo -confname demo-conf
[root@localhost solr-5.3.1]# curl 'http://192.168.219.132:8080/solr/admin/collections?action=CREATE&name=demo&numShards=3&replicationFactor=1'
这里有个规定,numshards*replicaFactor<livesolrNode*maxshardsPerNode
maxshardsPerNode默认为1
按以上操作就可以有简单的搜索了,但是要想更高级的搜索还需要将文档上传和定义分词器还有定义schema配置。
最后提供我的目录结构: