版本:
zookeeper-3.4.10
kafka_2.11-0.9.0.1
1.需要先设置好服务器的用户名和hosts表
vi /etc/sysconfig/network
修改hostname
HOSTNAME=server01
vi /etc/hosts
127.0.0.1 localhost localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.25.1.201 server01
172.25.1.202 server02
172.25.1.203 server03
2.依赖jdk,我装的是1.8.0
3.三台服务器集群(zk喜欢奇数)
我用的是zookeeper 3.4.10
在opt/
tar zxvf zookeeper-3.4.10
#配置环境变量
vi /etc/profile
加上
# Zookeeper
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.10
export PATH=$ZOOKEEPER_HOME/bin:$PATH
#修改配置文件
cd /opt/zookeeper-3.4.10/conf
mv zoo_simple.cfg zoo.cfg
vi zoo.cfg
#完整文件如下
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=server01:2888:3888
server.2=server02:2888:3888
server.3=server03:2888:3888
我只增加了最后三行,其它用的默认(复制这个配置文件到另两个服务器)
每台服务器分别配置myid
在上边指定的
dataDir 下
vi myid
写下myid的值,就是配置文件中的eserver.x的x。例如server.1就为1
启动服务器
/bin下
zkServer.sh start
输出信息:
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
判断启动是否成功
zkServer.sh status
有可能报错
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.10/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
先判断是否关闭防火墙
用
netstat -tunlp
看3888接口是否正常监听,我的是这样的
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1755/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 1644/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1943/master
tcp 0 0 ::ffff:172.25.1.201:3888 :::* LISTEN 26473/java
tcp 0 0 :::22 :::* LISTEN 1755/sshd
tcp 0 0 ::1:631 :::* LISTEN 1644/cupsd
tcp 0 0 :::45528 :::* LISTEN 26473/java
tcp 0 0 ::1:25 :::* LISTEN 1943/master
tcp 0 0 :::40666 :::* LISTEN 26968/java
tcp 0 0 :::9092 :::* LISTEN 26968/java
tcp 0 0 :::2181 :::* LISTEN 26473/java
udp 0 0 0.0.0.0:631 0.0.0.0:* 1644/cupsd
udp 0 0 0.0.0.0:68 0.0.0.0:* 254
成功后
server01
[root@server01 zookeeper-3.4.10]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
server02
[root@server02 zookeeper-3.4.10]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
server03
[root@server03 zookeeper-3.4.10]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader
二、kafka集群搭建
- 解压kafka安装包
- 配置文件 server.properties(172.25.1.201)
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
############################# Socket Server Settings #############################
listeners=PLAINTEXT://:9092
# The port the socket server listens on
port=9092
# Hostname the broker will bind to. If not set, the server will bind to all interfaces
host.name=172.25.1.201
# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured. Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>
# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=172.25.1.201:2181,172.25.1.202:2181,172.25.1.203:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
主要配置文件为server.properties,对于producer和consumer分别有producer.properties和consumer.properties,但是一般不需要单独配置,可以从server.properties中读取。
3. 启动各节点,分发此配置文件,修改broker.id和listeners地址,建立相应的目录。
[root@slave1 kafka]# ./bin/kafka-server-start.sh -daemon config/server.properties
[root@slave2 kafka]# ./bin/kafka-server-start.sh -daemon config/server.properties
[root@slave3 kafka]# ./bin/kafka-server-start.sh -daemon config/server.properties
- 验证是否成功
4.1. 创建一个topic名为my-test
[root@slave1 kafka]# bin/kafka-topics.sh --create --zookeeper 172.25.1.203:2181 --replication-factor 3 --partitions 1 --topic my-test
Created topic "my-test"
.
4.2. 发送消息,ctrl+c终止
[root@slave1 kafka]# bin/kafka-console-producer.sh --broker-list 172.25.1.201:9092 --topic my-test
今天是个好日子
hello
4.3 另一台机器上消费消息
[root@slave2 kafka]# bin/kafka-console-consumer.sh --zookeeper 172.25.1.203:2181 --from-beginning --topic my-test
今天是个好日子
hello
- Kafka HelloWord
在kafka的手册中给出了java版的producer和cousumer的代码示例.
修改下地址,逗号隔开,该地址是集群的子集,用来探测集群。
5.1.Producer代码示例
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class Producer {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers",
"172.25.1.201:9092,172.25.1.202:9092,172.25.1.203:9092");//该地址是集群的子集,用来探测集群。
props.put("acks", "all");// 记录完整提交,最慢的但是最大可能的持久化
props.put("retries", 3);// 请求失败重试的次数
props.put("batch.size", 16384);// batch的大小
props.put("linger.ms", 1);// 默认情况即使缓冲区有剩余的空间,也会立即发送请求,设置一段时间用来等待从而将缓冲区填的更多,单位为毫秒,producer发送数据会延迟1ms,可以减少发送到kafka服务器的请求数据
props.put("buffer.memory", 33554432);// 提供给生产者缓冲内存总量
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");// 序列化的方式,
// ByteArraySerializer或者StringSerializer
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 10000; i++) {
// 三个参数分别为topic, key,value,send()是异步的,添加到缓冲区立即返回,更高效。
producer.send(new ProducerRecord<String, String>("my-topic",
Integer.toString(i), Integer.toString(i)));
}
producer.close();
}
}
5.2.Consumer代码示例
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class Consumer {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers",
"172.25.1.201:9092,172.25.1.202:9092,172.25.1.203:9092");// 该地址是集群的子集,用来探测集群。
props.put("group.id", "test");// cousumer的分组id
props.put("enable.auto.commit", "true");// 自动提交offsets
props.put("auto.commit.interval.ms", "1000");// 每隔1s,自动提交offsets
props.put("session.timeout.ms", "30000");// Consumer向集群发送自己的心跳,超时则认为Consumer已经死了,kafka会把它的分区分配给其他进程
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");// 反序列化器
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("my-topic"));// 订阅的topic,可以多个
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s",
record.offset(), record.key(), record.value());
System.out.println();
}
}
}
}
5.3.分别运行即可。看到comsumer打印出消息日志。
最后运行的时候报错找不到slave01:9092
把服务器上的hosts里面的主机名对应的ip复制到win7的hosts就可以了。