Environment overrides include file
SOLR_PID_DIR=/var/s
SOLR_HOME=/var/solr/data
/opt/solr/server/resources/log4j.properties
to
/var/solr/log4j.properties 自定义日志格式
LOG4J_PROPS=/var/solr/log4j.properties
SOLR_LOGS_DIR=/var/solr/logs
bin/solr
script registers the
bin/oom_solr.sh
script to be called by the JVM if an OutOfMemoryError occurs. The
oom_solr.sh
script will issue a
kill -9
to the Solr process that experiences the
OutOfMemoryError
. This behavior is recommended when running in SolrCloud mode so that ZooKeeper is immediately notified that a node has experienced a non-recoverable error. Take a moment to inspect the contents of the
/opt/solr/bin/oom_solr.sh
script so that you are familiar with the actions the script will perform if it is invoked by the JVM.
SolrCloud
SOLR_MODE=solrcloud
ZK_HOST=zk1,zk2,zk3
ZooKeeper chroot
-Dcom.sun.management.jmxremote.local.only=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.port=18983 \
-Dcom.sun.management.jmxremote.rmi.port=18983
Setting Up a Single ZooKeeper
Create the instance
Configure the instance
tickTime=2000
dataDir=/var/lib/zookeeper
端口clientPort=2181
Run the instance
Point Solr at the instance
Shut down ZooKeeper
zkServer.sh stop
.
Setting up a ZooKeeper Ensemble
dataDir=/var/lib/zookeeperdata/
1
clientPort=
2181
initLimit=
5
syncLimit=
2
server.
1
=localhost:
2888
:
3888
server.
2
=localhost:
2889
:
3889
server.
3
=localhost:
2890
:
3890
<ZOOKEEPER_HOME>/conf/zoo2.cfg
tickTime=
2000
dataDir=c:/sw/zookeeperdata/
2
clientPort=
2182
initLimit=
5
syncLimit=
2
server.
1
=localhost:
2888
:
3888
server.
2
=localhost:
2889
:
3889
server.
3
=localhost:
2890
:
3890
<ZOOKEEPER_HOME>/conf/zoo3.cfg
:
tickTime=
2000
dataDir=c:/sw/zookeeperdata/
3
clientPort=
2183
initLimit=
5
syncLimit=
2
server.
1
=localhost:
2888
:
3888
server.
2
=localhost:
2889
:
3889
server.
3
=localhost:
2890
:
3890
tickTime :基本事件单元,以毫秒为单位。它用来指示心跳,最小的 session 过期时间为两倍的 tickTime. 。
dataDir :存储内存中数据库快照的位置,如果不设置参数,更新事务日志将被存储到默认位置。
clientPort :监听客户端连接的端口
d <ZOOKEEPER_HOME>
bin/zkServer.sh start zoo.cfg
bin/zkServer.sh start zoo2.cfg
bin/zkServer.sh start zoo3.cfg
localhost:
2181
,localhost:
2182
,localhost:
2183
-noprompt
2181,
118.244.192.244:
2181,
118.244.192.245:
2181
1)单机模式
首先,从Apache官方网站下载一个ZooKeeper 的最近稳定版本。
http://hadoop.apache.org/zookeeper/releases.html
作为国内用户来说,选择最近的的源文件服务器所在地,能够节省不少的时间。
http://labs.renren.com/apache-mirror//hadoop/zookeeper/
ZooKeeper 要求 JAVA 的环境才能运行,并且需要 JAVA6 以上的版本,可以从 SUN 官网上下载,并对JAVA 环境变量进行设置。除此之外,为了今后操作的方便,我们需要对 ZooKeeper 的环境变量进行配置,方法如下,在 /etc/profile 文件中加入如下的内容:
#Set ZooKeeper Enviroment
export ZOOKEEPER_HOME=/root/hadoop-0.20.2/zookeeper-3.3.1
export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
ZooKeeper 服务器包含在单个 JAR 文件中,安装此服务需要用户创建一个配置文档,并对其进行设置。我们在 ZooKeeper-*.*.* 目录(我们以当前 ZooKeeper 的最新版 3.3.1 为例,故此下面的“ ZooKeeper-*.*.* ”都将写为“ ZooKeeper-3.3.1” )的 conf 文件夹下创建一个 zoo.cfg 文件,它包含如下的内容:
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
在这个文件中,我们需要指定 dataDir 的值,它指向了一个目录,这个目录在开始的时候需要为空。下面是每个参数的含义:
tickTime :基本事件单元,以毫秒为单位。它用来指示心跳,最小的 session 过期时间为两倍的 tickTime. 。
dataDir :存储内存中数据库快照的位置,如果不设置参数,更新事务日志将被存储到默认位置。
clientPort :监听客户端连接的端口
使用单机模式时用户需要注意:这种配置方式下没有 ZooKeeper 副本,所以如果 ZooKeeper 服务器出现故障, ZooKeeper 服务将会停止。
以下代码清单 A 是我们的根据自身情况所设置的 zookeeper 配置文档: zoo.cfg
代码清单 A : zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# the directory where the snapshot is stored.
dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/data
# the port at which the clients will connect
clientPort=2181
2)集群模式
为了获得可靠的 ZooKeeper 服务,用户应该在一个集群上部署 ZooKeeper 。只要集群上大多数的ZooKeeper 服务启动了,那么总的 ZooKeeper 服务将是可用的。另外,最好使用奇数台机器。 如果 zookeeper拥有 5 台机器,那么它就能处理 2 台机器的故障了。
之后的操作和单机模式的安装类似,我们同样需要对 JAVA 环境进行设置,下载最新的 ZooKeeper 稳定版本并配置相应的环境变量。不同之处在于每台机器上 conf/zoo.cfg 配置文件的参数设置,参考下面的配置:
tickTime=2000
dataDir=/var/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
我们在拥有三台机器的 Hadoop 集群上测试使用 ZooKeeper 服务,下面代码清单 B 是我们根据自身情况所设置的 ZooKeeper 配置文档:
代码清单 B : zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/d1
# the port at which the clients will connect
clientPort=2181
server.1=IP1:2887:3887
server.2=IP2:2888:3888
server.3=IP3:2889:3889
清单中的 IP 分别对应的配置分布式 ZooKeeper 的 IP 地址。当然,也可以通过机器名访问 zookeeper ,但是需要在ubuntu 的 hosts 环境中进行设置。读者可以查阅 Ubuntu 以及 Linux 的相关资料进行设置。
3)集群伪分布
简单来说,集群伪分布模式就是在单机下模拟集群的ZooKeeper服务。
那么,如何对配置 ZooKeeper 的集群伪分布模式呢?其实很简单,在 zookeeper 配置文档中, clientPort参数用来设置客户端连接 zookeeper 的端口。 server.1=IP1:2887:3887 中, IP1 指示的是组成 ZooKeeper 服务的机器 IP 地址, 2887 为用来进行 leader 选举的端口, 3887 为组成 ZooKeeper 服务的机器之间通信的端口。集群伪分布模式我们使用每个配置文档模拟一台机器,也就是说,需要在单台机器上运行多个 zookeeper 实例。但是,我们必须要保证各个配置文档的 clientPort 不能冲突。
下面是我们所配置的集群伪分布模式,通过 zoo1.cfg , zoo2.cfg , zoo3.cfg 模拟了三台机器的 ZooKeeper集群。详见代码清单 C :
代码清单C : zoo1.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_1
# the port at which the clients will connect
clientPort=2181
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
zoo2.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2
# the port at which the clients will connect
clientPort=2182
#the location of the log file
dataLogDir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
zoo3.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2
# the port at which the clients will connect
clientPort=2183
#the location of the log file
dataLogDir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
从上述三个代码清单中可以看到,除了 clientPort 不同之外, dataDir 也不同。另外,不要忘记在 dataDir 所对应的目录中创建 myid 文件来指定对应的 zookeeper 服务器实例。