【Zookeeper】基本配置

配置文件zoo.cfg

Some configuration parameters do not have a default and must be set for every deployment. These are:

clientPort

The TCP port that clients use to connect to this server. By default, the server will
listen on all of its interfaces for connections on this port unlessclientPortAd
dressis set. The client port can be set to any number and different servers can listen
on different ports. The default port is2181.

Zookeeper监听的端口号,默认值为2181

dataDirand dataLogDir

dataDiris the directory where the fuzzy snapshots of the in-memory database will
be stored. If this server is part of an ensemble, theid file will also be in this directory.
ThedataDir does not need to reside on a dedicated device. The snapshots are
written using a background thread that does not lock the database, and the writes
to storage are not synced until the snapshot is complete.
Unless thedataLogDir option is set, the transaction log is also stored in this direc‐
tory. The transaction log is very sensitive to other activity on the same device as
this directory. The server tries to do sequential writes to the transaction log because
the data must be synced to storage before the server can acknowledge a transaction.
Other activity on the device—notably snapshots—can severely affect write through‐
put by causing disk heads to thrash during syncing. So, best practice is to use a
dedicated log device and setdataLogDir to point to a directory on that device.

如果没有设置dataLogDir,那么snapshots和transaction log都存放在dataDir所在目录;如果设置了dataLogDir,则分开存放;官方建议分开存放。

tickTime

The length of a tick, measured in milliseconds. The tick is the basic unit of meas‐
urement for time used by ZooKeeper, and it determines the bucket size for session
timeout。
The timeouts used by the ZooKeeper ensemble are specified in units oftickTime.
This means, in effect, that thetickTime sets the lower bound on timeouts because
the minimum timeout is a single tick. The minimum client session timeout is two
ticks.
The defaulttickTime is 3,000 milliseconds. Lowering the tickTime allows for
quicker timeouts but also results in more overhead in terms of network traffic
(heartbeats) and CPU time (session bucket processing).

tickTime是Zookeeper中最小的时间单位,判断session过期等都是在此基础上。默认值:3000ms


This section covers some of the more advanced configuration settings that apply to both standalone and ensemble configurations. They do not need to be set for ZooKeeper to function properly, but some (such asdataLogDir) really should be set.

preAllocSize

The number of kilobytes to preallocate in the transaction log files (zookeeper.pre
AllocSize).
When writing to the transaction log, the server will allocate blocks ofpreAlloc
Sizekilobytes at a time. This amortizes the file system overhead of allocating space
on the disk and updating metadata. More importantly, it minimizes the number of
seeks that need to be done.
By default,preAllocSize is 64 megabytes. One reason to lower this number is if
the transaction log never grows that large. Because a new transaction log is restarted
after each snapshot, if the number of transactions is small between each snapshot
and the transactions themselves are small, 64 megabytes may be too big. For ex‐
ample, if we take a snapshot every 1,000 transactions, and the average transaction
size is 100 bytes, a 100-kilobytepreAllocSize would be much more appropriate.
The defaultpreAllocSize is appropriate for the default snapCount and transactions
that average more than 512 bytes in size.

preAllocSize,写transaction log预分配的大小,这个值的大小和实际上transaction log的大小和snap的数量有关。

snapCount

The number of transactions between snapshots (zookeeper.snapCount).
When a ZooKeeper server restarts, it needs to restore its state. Two big factors in
the time it takes to restore the state are the time it takes to read in a snapshot, and
the time it takes to apply transactions that occurred after the snapshot was started.
Snapshotting often will minimize the number of transactions that must be applied
after the snapshot is read in. However, snapshotting does have an effect on the
server’s performance, even though snapshots are written in a background thread.
By default thesnapCount is 100000. Because snapshotting does affect performance,
it would be nice if all of the servers in an ensemble were not snapshotting at the
same time. As long as a quorum of servers is not snapshotting at once, the processing
time should not be affected. For this reason, the actual number of transactions in
each snapshot is a random number close tosnapCount.
Note also that ifsnapCount is reached but a previous snapshot is still being taken,
a new snapshot will not start and the server will wait anothersnapCount transac‐
tions before starting a new snapshot.

snapCount,每个snapshot中transaction的数量,默认值100000.根据实际情况调整。

autopurge.snapRetainCount

The number of snapshots and corresponding transaction logs to retain when purg‐
ing data.
ZooKeeper snapshots and transaction logs are periodically garbage collected. The
autopurge.snapRetainCountgoverns the number of snapshots to retain while
garbage collecting. Obviously, not all of the snapshots can be deleted because that

would make it impossible to recover a server; the minimum for autopurge.snap
RetainCountis 3, which is also the default.
snapshots和transaction logs会定期被垃圾回收器回收,autopurge.snapRetainCount表示,当被垃圾回收时,保持的数量。默认值3。

autopurge.purgeInterval

The number of hours to wait between garbage collecting (purging) old snapshots
and logs. If set to a nonzero number,autopurge.purgeInterval specifies the pe‐
riod of time between garbage collection cycles. If set to zero, the default, garbage
collection will not be run automatically but should be run manually using the
zkCleanup.shscript in the ZooKeeper distribution.

autopurge.purgeInterval垃圾回收snapshots和transaction logs的周期,单位小时。如果该值不为0,则定期进行垃圾回收;否则,收到执行zkCleanup.sh脚本进行回收。

fsync.warningthresholdms

The duration in milliseconds of a sync to storage that will trigger a warning
(fsync.warningthresholdms).
A ZooKeeper server will sync a change to storage before it acknowledges the change.

fsync.warningthresholdms触发警告时间

weight.x=n

Used along with group options, this assigns a weight n to a server when forming
quorums. The valuen is the weight of a server when voting. A few parts of Zoo‐
Keeper require voting, such as leader election and the atomic broadcast protocol.
By default, the weight of a server is 1. If the configuration defines groups but not
weights, a weight of 1 will be assigned to all servers.
If the sync system call takes too long, system performance can be severely impacted.
The server tracks the duration of this call and will issue a warning if it is longer than
fsync.warningthresholdms. By default, it’s 1,000 milliseconds.

weight.x=n,选举leader的权重,默认值1.

traceFile

Keeps a trace of ZooKeeper operations by logging them in trace files named trace‐
File.year.month.day. Tracing is not done unless this option is set (requestTrace
File).
This option is used to get a detailed view of the operations going through Zoo‐
Keeper. However, to do the logging, the ZooKeeper server must serialize the oper‐
ations and write them to disk. This causes CPU and disk contention. If you use this
option, be sure to avoid putting the trace file on the log device. Also realize that,
unfortunately, tracing does perturb the system and thus may make it hard to recreate problems that happen when tracing is off. Just to make it interesting, the traceFileJava system property has no zookeeper prefix and the property name
does not match the name of the configuration variable, so be careful.

traceFile,Zookeeper操作轨迹保持文件,形式traceFile.year.month.day。如果设置了此值,会占用CPU和磁盘资源。小心使用此值。


These options place limits on communication between servers and clients. Timeouts are also covered in this section:

globalOutstandingLimit

The maximum number of outstanding requests in ZooKeeper (zookeeper.glob
alOutstandingLimit).
ZooKeeper clients can submit requests faster than ZooKeeper servers can process
them. This will lead to requests being queued at the ZooKeeper servers and even‐
tually (as in, in a few seconds) cause the servers to run out of memory. To prevent
this, ZooKeeper servers will start throttling client requests once theglobalOut
standingLimithas been reached. But globalOutstandingLimit is not a hard limit;
each client must be able to have at least one outstanding request, or connections
will start timing out. So, after theglobalOutstandingLimit is reached, the servers
will read from client connections only if they do not have any pending requests.
To determine the limit of a particular server out of the global limit, we simply divide
the value of this parameter by the number of servers. There is currently no smart
way implemented to figure out the global number of outstanding operations and
enforce the limit accordingly. Consequently, this limit is more of an upper bound
on the number of outstanding requests. As a matter of fact, having the load perfectly
balanced across servers is typically not achievable, so some servers that are running
a bit slower or that are a bit more loaded may end up throttling even if the global
limit has not been reached.
The default limit is 1,000 requests. You will probably not need to modify this pa‐
rameter. If you have many clients that are sending very large requests you may need
to lower the value, but we have never seen the need to change it in practice.

globalOutstandingLimit,Zookeeper client最大的请求数,默认值1000.实际生产中99%不需要改。因为Zookeeper client提交请求的速度可能会远超过server处理请求的速度,一旦超过很多,会导致请求排队甚至内存溢出,所以官方默认值是1000,而且“ we have never seen the need to change it in practice”。

maxClientCnxns

The maximum number of concurrent socket connections allowed from each IP address.
ZooKeeper uses flow control and limits to avoid overload conditions. The resources
used in setting up a connection are much higher than the resources needed for
normal operations. We have seen examples of errant clients that spun while creating
many ZooKeeper connections per second, leading to a denial of service. To remedy
the problem, we added this option, which will deny new connections from a given
IP address if that address hasmaxClientCnxns active. The default is 60 concurrent
connections.
Note that the connection count is maintained at each server. If we have an ensemble
of five servers and the default is 60 concurrent connections, a rogue client will
randomly connect to the five different servers and normally be able to establish
close to 300 connections from a single IP address before triggering this limit on one
of the servers.

maxClientCnxns,每个zk客户端(IP)最多与一个服务器建立的连接数量的限制。注意只是限制与一个服务器的限制,不是整个zk集群。设置此参数是因为,zk客户端如果每秒创建了大量连接,会导致server不可用。其默认值是60.

clientPortAddress

Limits client connections to those received on the given address.
By default, a ZooKeeper server will listen on all its interfaces for client connections.
However, some servers are set up with multiple network interfaces, generally one
interface on an internal network and another on a public network. If you do not
want a server to allow client connections from the public network, set theclient
PortAddressto the address of the interface on the private network.

zk服务器默认不限制client的IP等,如果特殊需要,想限制client的ip端口等,可配置此参数。

minSessionTimeout

The minimum session timeout in milliseconds. When clients make a connection,
they request a specific timeout, but the actual timeout they get will not be less than
minSessionTimeout.
ZooKeeper developers would love to be able to detect client failures immediately
and accurately. Unfortunately,systems cannot do this under real conditions. Instead, they
use heartbeats and timeouts. The timeouts to use depend on the responsiveness of
the ZooKeeper client and server machines and, more importantly, the latency and
reliability of the network between them. The timeout must be equal to at least the
network round trip time between the client and server, but occasionally packets
will be dropped, and when that happens the time it takes to receive a response is
increased by the retransmission timeout as well as the latency of receiving the re‐
transmitted packet.
By default,minSessionTimeout is two times the tickTime. Setting this timeout too
low will result in incorrect detection of client failures. Setting this timeout too high
will delay the detection of client failures.

minSessionTimeout,最小session超时时间,默认2个tickTime。如果此值太小会导致不正确地判断client session失效;若此值太大会延缓session过期的判读。

maxSessionTimeout

The maximum session timeout in milliseconds. When clients make a connection,
they request a specific timeout, but the actual timeout they get will not be greater
thanmaxSessionTimeout.
Although this setting does not affect the performance of the system, it does limit
the amount of time for which a client can consume system resources. By default,
maxSessionTimeoutis 20 times the tickTime.

maxSessionTimeout,最大session超时时间,默认值20个tickTime。这个值不会影响系统,但可能会消耗系统资源。


When an ensemble of servers provide the ZooKeeper service, we need to configure each server to have the correct timing and server list so that the servers can connect to each other and detect failures. These parameters must be the same on all the ZooKeeper servers in the ensemble:

initLimit

The timeout, specified in number of ticks, for a follower to initially connect to a
leader.
When a follower makes an initial connection to a leader, there can be quite a bit of
data to transfer, especially if the follower has fallen far behind.initLimit should
be set based on the transfer speed of the network between leader and follower and
the amount of data to be transferred. If the amount of data stored by ZooKeeper is
particularly large (i.e., if there are a large number of znodes or large data sets) or
the network is particularly slow,initLimit should be increased. Because this value
is so specific to the environment, there is no default for it. You should choose a value
that will conservatively allow the largest expected snapshot to be transferred. Be‐
cause you may have more than one transfer happening at a time, you may want to
setinitLimit to twice that expected time. If you set the initLimit too high, it will
take longer for initial connections to faulty servers to fail, which can increase re‐
covery time. For this reason it is a good idea to benchmark how long it takes for a
follower to connect to a leader on your network with the amount of data you plan
on using to find your expected time.

initLimit,zk server允许follower同步资源的初始时间,一般根据实际情况定。

syncLimit

The timeout, specified in number of ticks, for a follower to sync with a leader.
A follower will always be slightly behind the leader, but if the follower falls too far
behind—due to server load or network problems, for example—it needs to be
dropped. If the leader hasn’t been able to sync with a follower for more thansyn
cLimitticks, it will drop the follower. Just like initLimit,syncLimit does not have
a default and must be set. UnlikeinitLimit,syncLimit does not depend on the
amount of data stored by ZooKeeper; instead, it depends on network latency and
throughput. On high-latency networks it will take longer to send data and get re‐
sponses back, so naturally thesyncLimit will need to be increased. Even if the
latency is relatively low, you may need to increase thesyncLimit because any rel‐
atively large transaction may take a while to transmit to a follower.

syncLimit, leader与follower通信的时间,依赖于网络状况。

leaderServes

A “yes” or “no”flag indicating whether or not a leader will service clients (zookeep
er.leaderServes
).
The ZooKeeper server that is serving as leader has a lot of work to do. Ittalks with
all the followers and executes all changes. This means the load on the leaderis
greater than that on the follower. If the leader becomes overloaded, the entiresystem
may suffer.
This flag, if set to “no,” can remove the burden of servicing clientconnections from
the leader and allow it to dedicate all its resources to processing the changeoper

ations sent to it by followers. This will increase the throughput of operationsthat

change systemstate. On the other hand, if the leader doesn’t handle any of the client
connections itself directly, the followers will have more clients becauseclients that
would have connected to the leader will be spread among the followers. This is
particularly problematic if the number of servers in an ensemble is low. Bydefault,
leaderServes is set to “yes.”

默认情况下,Leader是会接受客户端连接,并提供正常的读写服务。但是,如果你想让Leader专注于集群中机器的协调,那么可以将这个参数设置为no,这样一来,会大大提高写操作的性能。

server.x=[hostname]:n:n[:observer]

Sets theconfiguration for server x.
ZooKeeper servers need to know how to communicate with each other. A config

uration entry of this form in the configuration file specifies theconfiguration for a
given server
x, where x is the ID of the server (an integer). When a server starts up,
it gets its number from the
myid file in the data directory. It then uses this number
to find the
server.x entry. It willconfigure itself using the data in this entry. If it
needs to contact another server,
y, it will use the information in the server.y entry
to contact the server.
The
hostname is the name of the server on the network n. There are two TCP port
numbers. The first port is used to send transactions, and the second is forleader
election. The ports typically used are
2888:3888. If observer is in the final field,
the server entry represents an observer.
Note that it is quite important that all servers use the same
server.x configuration;
otherwise, the ensemble won’t work properly because servers might not be ableto
establish connections properly.

zk集群配置,xmyid文件内容一致,第一个端口用于发送transaction,第二个端口用于leader的选举。通常是2888:3888.

cnxTimeout

The timeoutvalue for opening a connection during leader election (zookeep
er.cnxTimeout
).
The ZooKeeper servers connect to each other during leader election. This value
determines how long a server will wait for a connection to complete beforetrying
again.  The default value of 5 seconds is very generous and probably willnot need to be adjusted.

选主过程中各服务器通信的超时时间,默认是5s

electionAlg

The electionalgorithm.
We have included this configuration option for completeness. It selects amongdif

ferent leader election algorithms, but all have been deprecated except for theone
that is the default. You shouldn’t need to use this option.

选主算法,一般不用设置,zk会默认。

 

This sectioncontains the options that are used for authentication and authorization.Forinfomation on configuration options for Kerberos

 

zookeeper.DigestAuthenticationProvider.superDigest (Java system propertyonly)

This systemproperty specifies the digest for the “super” user’s password. (This
feature is disabled by default.) A client that authenticates as
super bypasses all ACL
checking. The value of this system property will have the form
super:encoded_di
gest
. To generate the encoded digest, use the org.apache.zookeeper.server.auth.Di
gestAuthenticationProvider
utility as follows:
java -cp $ZK_CLASSPATH \
org.apache.zookeeper.server.auth.DigestAuthenticationProvider super:asdf
The following example command line generates an encoded digest for thepassword
asdf:
super:asdf->super:T+4Qoey4ZZ8Fnni1Yl2GZtbH2W4=
To start a server using this digest, you can use the following command:
export SERVER_JVMFLAGS
SERVER_JVMFLAGS=-Dzookeeper.DigestAuthenticationProvider.superDigest=
super:T+4Qoey4ZZ8Fnni1Yl2GZtbH2W4=
./bin/zkServer.sh start
Now, when connecting with zkCli, you can issue the following:
[zk: localhost:2181(CONNECTED) 0] addauth digest super:asdf
[zk: localhost:2181(CONNECTED) 1]
At this point you are authenticated as the super user and will not be restricted by any
ACLs.

权限设置,可以设置以认证的方式,连接zk server

 

The followingoptions can be useful, but be careful when you use them. They really are forvery special situations. The majority of administrators who think they needthem probably don’t(下面参数必须小心使用,大多数参数仅做了解):

forceSync

A “yes” or “no”option that controls whether data should be synced to storage
(
zookeeper.forceSync).
By default, and when
forceSync is set to yes, transactions will not be acknowledged
until they have been synced to storage. The sync system call is expensive andis the
cause of one of the biggest delays in transaction processing. If
forceSync is set to
no, transactions will be acknowledged as soon as they have been written tothe
operating system, which usually caches them in memory before writing them to
disk. Setting
forceSync to no will yield an increase in performance at the cost of
recoverability in the case of a server crash or power outage.

forceSync,确定了是否需要在日志事务提交时,完全持久化到磁盘。

这个参数确定了是否需要在事务日志提交的时候调用 这个参数确定了是否需要在事务日志提交的时候调用 

jute.maxbuffer (Java system property only)

The maximumsize, in bytes, of a request or response. This option can be set only
as a Java system property. There is no
zookeeper. prefix on it.
ZooKeeper has some built-in sanity checks, one of which is the amount of datathat
can be transferred for a given znode. ZooKeeper is designed to storeconfiguration
data, which generally consists of small amounts of metadata information (on the
order of hundreds of bytes). By default, if a request or response has more than1
megabyte of data, it is rejected as insane. You may want to use this propertyto make
the sanity check smaller or, if you really are insane, increase it.

请求或响应的最大字节数,默认值大于1M

skipACL

Skips all ACLchecks (zookeeper.skipACL).
There is some overhead associated with ACL checking. This option can be used to
turn off all ACL checking. It will increase performance, but will leave thedata com

pletely open to any client that can connect to a ZooKeeper server.

skipACL,是否忽略所有ACL的控制。

readonlymode.enabled (Java systemproperty only)

Setting thisvalue to true enables read-only-mode server support. Clients that re
quest read-only-mode support will be able to connect to a server to read(possibly
stale) information even if that server is partitioned from the quorum. Toenable
read-only mode, a client needs to set
canBeReadOnly to true.

This featureenables a client to read (but not write) the state of ZooKeeper in the
presence of a network partition. In such cases, clients that have beenpartitioned
away can still make progress and don’t need to wait until the partition heals.It is
very important to note that a ZooKeeper server that is disconnected from therest
of the ensemble might end up serving stale state in read-only mode.

只读模式设置。



  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值