背景:
nifi集群配置要用到ZK, 考虑到nifi有自带ZK, 准备用自带ZK作集群管理.
初始配置:
1. /etc/hosts
192.168.1.17 node-3
192.168.56.101 node-2
192.168.56.102 node-1
2. ssh免密登录
ssh-keygen -t rsa
添加公钥到authorized_keys
3. zookeeper.properties
clientPort=2181
initLimit=10
autopurge.purgeInterval=24
syncLimit=20
tickTime=10000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30
server.1=node-1:2888:3888
server.2=node-2:2888:3888
server.3=node-3:2888:3888
4. 增加zookeeperId
mkdir ./state/zookeeper -p
echo 1 > ./state/zookeeper/myid
5. nifi.properties
用自带的zookeeper
nifi.state.management.embedded.zookeeper.start=true
连接字符串
nifi.zookeeper.connect.string=node-1:2181,node-2:2181,node-3:2181
http而非https
nifi.cluster.protocol.is.secure=false
集群配置
nifi.cluster.is.node=true
nifi.cluster.node.address=node-1
nifi.cluster.node.protocol.port=9999
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
交互配置
nifi.remote.input.host=node-1
nifi.remote.input.secure=false
nifi.remote.input.socket.port=9998
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
web显示名称
nifi.web.http.host=node-1
遇到的问题
2019-10-31 17:12:29,794 ERROR [LearnerHandler-/192.168.0.9:58442] o.a.z.server.quorum.LearnerHandler Unexpected exception causing shutdown while sock still open
java.io.EOFException: null
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readString(BinaryInputArchive.java:79)
at org.apache.zookeeper.data.Id.deserialize(Id.java:55)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:92)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:309)
2019-10-31 14:44:26,984 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-10-31 14:44:28,193 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again
这个问题是由于连接字符串配置错误导致的, 把连接字符串的hostname改ip之后问题消失
问题处理过程中, 尝试独立部署zookeeper集群, 改了几次配置最后成功部署, 但是再用nifi连独立zk集群的时候依然报上面的错误, 改来改去发现nifi.zookeeper.connect.string的port使用错误, 翻阅了一部分资料, 强化一下zookeeper配置:
zookeeper中有3个port:
clientPort=2181
server.1=node-1:2888:3888
- clientPort表示 zookeeper运行对外暴露的端口, 即其他应用访问zookeeper集群的时候使用的只有该端口
- server.1=node-1:2888:3888中的前一个端口2888是zk集群中各个节点用于和leader节点通信的端口
- server.1=node-1:2888:3888中的后一个端口3888是用于集群leader选举的端口