搭建flume+zookeeper+kafka
zookeeper:是一个分布式的,开放源码的分布式应用程序协调服务
kafka:开源流处理平台
storm:是Twitter开源的分布式实时大数据处理框架,被业界称为实时版Hadoop
flume:日志收集系统
一、下载需要的软件
apache-flume-1.6.0-bin.tar.gz wget http://archive.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
zookeeper-3.4.14.tar.gz wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
apache-storm-1.2.3.tar.gz wget https://mirrors.tuna.tsinghua.edu.cn/apache/storm/apache-storm-1.2.3/apache-storm-1.2.3.tar.gz
kafka_2.11-1.0.1.tgz wget http://archive.apache.org/dist/kafka/1.0.1/kafka_2.11-1.0.1.tgz
1.环境准备:centos7系统
三台主机
192.168.35.6 master
192.168.35.7 slave1
192.168.35.8 slave2
2.每个节点执行
[root@master ]# echo '
> 192.168.35.6 master
>
> 192.168.35.7 slave1
>
> 192.168.35.8 slave2
> ' >>/etc/hosts
3.关闭防火墙和selinux
#关闭防火墙
[root@master ]# systemctl stop firewalld
[root@master ]# systemctl disable firewalld
#关闭selinux
[root@master ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
reboot重启一下
[root@master ~]# getenforce
Disabled
4.安装jdk
[root@master ]# rz jdk-8u131-linux-x64.rpm
[root@master ]# rpm -ivh jdk-8u131-linux-x64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:jdk1.8.0_131-2000:1.8.0_131-fcs ################################# [100%]
[root@master ]# java -version
二、zookeeper 集群搭建
0.zookeeper的作用
ZooKeeper的基本运转流程:
1、选举Leader。
2、同步数据。
3、选举Leader过程中算法有很多,但要达到的选举标准是一致的。
4、Leader要具有最高的执行ID,类似root权限。
5、集群中大多数的机器得到响应并接受选出的Leader。
1.对每个节点操作
#1.下载zookeeper
wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
#2.解压
[root@master ]# tar xf zookeeper-3.4.14.tar.gz -C /usr/local/
#3.复制配置文件
[root@master ]# cd /usr/local/
[root@master ]# cd zookeeper-3.4.14/
[root@master zookeeper-3.4.14]# cp conf/zoo_sample.cfg conf/zoo.cfg
[root@master zookeeper-3.4.14]# vim conf/zoo.cfg
[root@master conf]# vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataLogDir=/opt/zookeeper/logs
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=24
server.1= 192.168.35.6:2888:3888
server.2= 192.168.35.7:2888:3888
server.3= 192.168.35.8:2888:3888
#配置环境变量
[root@master ]# vim /etc/profile
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/zookeeper-3.4.14/bin
[root@master ]# source /etc/profile
#4.创建相关目录,三台节点都需要
[root@master conf]# mkdir -p /opt/zookeeper/{logs,data}
#5.配置myid(创建ServerID标识)每个节点不同
192.168.35.6
[root@master conf]# echo "1" > /opt/zookeeper/data/myid
192.168.35.7
[root@master conf]# echo "2" > /opt/zookeeper/data/myid
192.168.35.8
[root@master conf]# echo "3" > /opt/zookeeper/data/myid
对应每个节点的 myid 的值
#5.启动每个节点的zk
[root@master local]# /usr/local/zookeeper-3.4.14/bin/zkServer.sh start
#6.查看节点状态(把所有节点都启动)
[root@master local]# /usr/local/zookeeper-3.4.14/bin/zkServer.sh status
[
2.测试连接zookeeper
如上图所示,zookooper集群搭建完成
3.报错解决思路
[root@master bin]# /usr/local/zookeeper-3.4.14/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
#像上面这种报错,3种可能(2181端口不能被占用)
1./conf/zoo.cfg配置文件中有错误
2.防火墙没有关闭
3.把所有节点都启动之后再看状态
4.ZooKeeper 未授权访问漏洞解决方案
https://www.bbsmax.com/A/A2dmmgEqde/
https://www.jianshu.com/p/86a7f506d1d2
三、kafka 集群搭建
0.kafka特性
Kafka是一种高吞吐量的分布式发布订阅消息系统,有如下特性:
- 通过O(1)的磁盘数据结构提供消息的持久化,这种结构对于即使数以TB的消息存储也能够保持长时间的稳定性能。
- 高吞吐量:即使是非常普通的硬件Kafka也可以支持每秒数百万的消息。
- 支持通过Kafka服务器和消费机集群来分区消息。
- 支持Hadoop并行数据加载。
1.对每个节点操作
#1.下载kafka软件
wget http://archive.apache.org/dist/kafka/1.0.1/kafka_2.11-1.0.1.tgz
#2.解压
[root@master install]# tar xf kafka_2.11-1.0.1.tgz -C /usr/local/
#3.配置环境变量
[root@master bin]# vim /etc/profile
#set kafka environment
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/kafka_2.11-1.0.1/bin
[root@master bin]# source /etc/profile
#4.在master主机中修改server.properties配置文件
[root@master bin]# vim /usr/local/kafka_2.11-1.0.1/config/server.properties
21 broker.id=1
31 listeners=PLAINTEXT://192.168.35.6:9092
123 zookeeper.connect=192.168.35.6:2181,192.168.35.7:2181,192.168.35.8:2181
在slave1主机中修改server.properties配置文件
[root@master bin]# vim /usr/local/kafka_2.11-1.0.1/config/server.properties
21 broker.id=2
31 listeners=PLAINTEXT://192.168.35.7:9092
123 zookeeper.connect=192.168.35.6:2181,192.168.35.7:2181,192.168.35.8:2181
在slave1主机中修改server.properties配置文件
[root@master bin]# vim /usr/local/kafka_2.11-1.0.1/config/server.properties
21 broker.id=3
31 listeners=PLAINTEXT://192.168.35.8:9092
123 zookeeper.connect=192.168.35.6:2181,192.168.35.7:2181,192.168.35.8:2181
#5.启动kafka(要确保zookeeper已启动)
[root@master install]# /usr/local/kafka_2.11-1.0.1/bin/kafka-server-start.sh -daemon /usr/local/kafka_2.11-1.0.1/config/server.properties
#6.查看端口有没有启动
[root@master install]# netstat -lntp |grep 9092
tcp6 0 0 192.168.35.6:9092 :::* LISTEN 26590/java
[root@slave1 install]# netstat -lntp | grep 9092
tcp6 0 0 192.168.35.7:9092 :::* LISTEN 23408/java
[root@slave2 install]# netstat -lntp |grep 9092
tcp6 0 0 192.168.35.8:9092 :::* LISTEN 28902/java
/usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server 192.168.2.150:9092 --topic order --from-beginning
/usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server 192.168.2.158:9092 --topic order --from-beginning
2.测试
#在kafka集群中创建一个topic:
[root@master install]# /usr/local/kafka_2.11-1.0.1/bin/kafka-topics.sh --create --zookeeper 192.168.35.6:2181 --replication-factor 3 --partitions 1 --topic test
Created topic "test".
这样就说明创建成功
解释:
--replication-factor 3 #复制两份
--partitions 1 #创建1个分区
--topic #主题为test
#查看一下自己创建的topic:
[root@master install]# /usr/local/kafka_2.11-1.0.1/bin/kafka-topics.sh --list --zookeeper 192.168.35.6:2181
__consumer_offsets
test
#在192.168.35.6机器上创建一个producer,发布者
[root@master install]# /usr/local/kafka_2.11-1.0.1/bin/kafka-console-producer.sh --broker-list 192.168.35.6:9092 --topic test
#在192.168.35.7与192.168.35.8机器上分别创建一个consumer,消费者
[root@slave1 install]# /usr/local/kafka_2.11-1.0.1/bin/kafka-console-consumer.sh --bootstrap-server 192.168.35.7:9092 --topic test --from-beginning
[root@slave2 install]# /usr/local/kafka_2.11-1.0.1/bin/kafka-console-consumer.sh --bootstrap-server 192.168.35.8:9092 --topic test --from-beginning
3.在两台消费者机器上可以看到
并且在zookeeper可以到kafka的一些情况
[root@master install]# /usr/local/zookeeper-3.4.14/bin/zkCli.sh -server 192.168.35.6:2181
WatchedEvent state:SyncConnected type:None path:null
[zk: 192.168.35.6:2181(CONNECTED) 0] ls
[zk: 192.168.35.6:2181(CONNECTED) 1] ls /
[cluster, controller_epoch, controller, brokers, zookeeper, admin, isr_change_notification, consumers, log_dir_event_notification, latest_producer_id_block, config]
上面的显示结果中:只有zookeeper是,zookeeper原生的,其他都是Kafka创建的
标注一个重要的
get /brokers/ids/1
4.删除topic命令
[root@master bin]# /usr/local/kafka_2.11-1.0.1/bin/kafka-topics.sh --delete --zookeeper 192.168.35.6:2181 --topic order
5.查看某个Topic的详情
[root@master bin]# /usr/local/kafka_2.11-1.0.1/bin/kafka-topics.sh --topic test --describe --zookeeper 192.168.35.6:2181
Topic:test PartitionCount:1 ReplicationFactor:3 Configs:
Topic: test Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
- PartitionCount 分区数量
- ReplicationFactor 复制因子数量
- leader 是在给出的所有partitons中负责读写的节点,每个节点都有可能成为leader
- replicas 显示给定partiton所有副本所存储节点的节点列表,不管该节点是否是leader或者是否存活。
- isr 副本都已同步的的节点集合,这个集合中的所有节点都是存活状态,并且跟leader同步
四、storm 搭建
0.Storm的介绍
Storm是Twitter开源的分布式实时大数据处理框架,最早开源于github,从0.9.1版本之后,归于Apache社区,被业界称为实时版Hadoop。随着越来越多的场景对Hadoop的MapReduce高延迟无法容忍,比如网站统计、推荐系统、预警系统、金融系统(高频交易、股票)等等,大数据实时处理解决方案(流计算)的应用日趋广泛,目前已是分布式技术领域最新爆发点,而Storm更是流计算技术中的佼佼者和主流。
1.对每个节点操作
#1.下载
[root@master ]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/storm/apache-storm-1.2.3/apache-storm-1.2.3.tar.gz
#2.解压
[root@master ]# tar xf apache-storm-1.2.3.tar.gz -C /usr/local/
#3.修改配置
master配置
[root@master local]# vim /usr/local/apache-storm-1.2.3/conf/storm.yaml
17 ########### These MUST be filled in for a storm configuration
18 storm.zookeeper.servers:
19 - "master"
20 - "slave1"
21 - "slave2"
22
23 nimbus.seeds: ["master", "slave1", "slave2"]
24 storm.local.dir: "/opt/storm/data/storm"
25 supervisor.slots.ports:
26 - 6700
27 - 6701
28 - 6702
29 - 6703
slave1配置
17 ########### These MUST be filled in for a storm configuration
18 storm.zookeeper.servers:
19 - "master"
20 - "slave1"
21 - "slave2"
22
23 nimbus.seeds: ["master", "slave1", "slave2"]
24 storm.local.dir: "/opt/storm/data/storm"
25 supervisor.slots.ports:
26 - 6700
27 - 6701
28 - 6702
29 - 6703
slave2配置
17 ########### These MUST be filled in for a storm configuration
18 storm.zookeeper.servers:
19 - "master"
20 - "slave1"
21 - "slave2"
22
23 nimbus.seeds: ["master", "slave1", "slave2"]
24 storm.local.dir: "/opt/storm/data/storm"
25 supervisor.slots.ports:
26 - 6700
27 - 6701
28 - 6702
29 - 6703
[root@master local]# mkdir -p /opt/storm/data/storm
#4.在所有主机上配置环境变量
[root@master ]# vim /etc/profile
# apache storm
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/apache-storm-1.2.3/bin
[root@master ]# source /etc/profile
#5.启动
在主机master上开启nimbus进程
[root@master local]# /usr/local/apache-storm-1.2.3/bin/storm nimbus &
[1] 40350
在另外两台机子上开启supervisor 进程
[root@slave1 ~]# /usr/local/apache-storm-1.2.3/bin/storm supervisor &
[1] 34364
开启完按Ctrl+c
在master主机上开启
[root@master local]# /usr/local/apache-storm-1.2.3/bin/storm ui &
[2] 40559
[root@master local]# /usr/local/apache-storm-1.2.3/bin/storm logviewer &
[3] 40635
这样就可以通过web查看storm部署情况了
2.访问http://ip:8080/,如图:
storm搭建完成
五、搭建flume
1.对每个节点操作
#1.下载
wget http://archive.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
#2.解压
[root@master install]# tar xf apache-flume-1.6.0-bin.tar.gz -C /usr/local/
#3.修改配置
[root@master install]# cd /usr/local/apache-flume-1.6.0-bin/
[root@master apache-flume-1.6.0-bin]# cp conf/flume-env.sh.template conf/flume-env.sh
22 export JAVA_HOME=/usr/java/jdk1.8.0_131 (配置本机Java的环境变量)
#4.查看版本
[root@master bin]# cd /usr/local/apache-flume-1.6.0-bin/bin
[root@master bin]# ./flume-ng version
Flume 1.6.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
From source with checksum b29e416802ce9ece3269d34233baf43f
#5.配置环境变量
[root@master bin]# vim /etc/profile
# apache flume
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/apache-flume-1.6.0-bin/bin/
[root@master bin]# source /etc/profile
#6.配置flume集成kafka启动文件
https://blog.csdn.net/asdf57847225/article/details/78517669