kafka集群安装部署
环境介绍
系统 : centos7
机器 : 10.5.32.1,10.5.32.2,10.5.32.3
kafka : kafka_2.11-0.11.0.1
zookeeper : kafka内置zookeeper
安装部署
先下载安装包
cd /home/kafka
wget https://archive.apache.org/dist/kafka/0.11.0.1/kafka_2.11-0.11.0.1.tgz
tar -zxvf kafka_2.11-0.11.0.1.tgz
cd kafka_2.11-0.11.0.1/
其他两台机器也是如此
安装zookeeper
修改zookeeper的配置文件(三台节点都一样)
vim config/zookeeper.properties
# 记住这个目录,后面会用到
dataDir=/usr/kafka/data/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
tickTime=2000
server.1=10.5.32.1:2888:3888
server.2=10.5.32.2:2888:3888
server.3=10.5.32.3:2888:3888
创建上面的目录,并生成相应的myid文件
mkdir -p /usr/kafka/data/zookeeper
# 对应上面配置的serverid
echo 1 > /usr/kafka/data/zookeeper/myid
保存退出,启动即可
./bin/zookeeper-server-start.sh -daemon ./config/zookeeper.properties &
zookeeper搭建完成
kafka搭建
vim conf/server.properties
修改以下几个属性值
# 其他两台机器分别为2和3
broker.id=1
# 分别对应每台机器的ip或者主机名
listeners=PLAINTEXT://10.5.32.1:9092
# 根据需求修改为相应的地址
log.dirs=/var/log/kafka-logs
# 我本来是想在zookeeper中添加一个节点,把这个kafka集群创建的所有节点都放在一个节点下面,
# 但没有找到方法,如果有知道的,麻烦告知,谢谢
zookeeper.connect=10.5.32.1:2181,10.5.32.2:2181,10.5.32.3:2181
依次启动即可
./bin/kafka-server-start.sh -daemon ./config/server.properties &
kafka搭建完成
遇到的问题
搭建完成之后测试,发现有时会出现大量丢失数据的现象,而且都是数据并没有发到kafka上,数据丢失率在90%左右,代码如下:
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.clients.producer.KafkaProducer;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class ProducerTest {
public static void main(String[] args) throws ExecutionException, InterruptedException {
String x = "";
Properties properties = new Properties();
properties.put("bootstrap.servers", "10.5.32.1:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> kafkaProducer = new KafkaProducer<>(properties);
for (int i = 0; i < 100000; i++) {
kafkaProducer.send(new ProducerRecord<>("topic-name", x + i));
}
kafkaProducer.flush();
kafkaProducer.close();
}
}
后在properties中加了如下两行配置,丢失数据的情况就没有了
properties.put("linger.ms", 1);
properties.put("batch.size", 10);
或者改为同步发送数据,丢失数据的情况也没了,代码如下:
public class ProducerTest {
public static void main(String[] args) throws ExecutionException, InterruptedException {
String x = "";
Properties properties = new Properties();
properties.put("bootstrap.servers", "10.5.32.1:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> kafkaProducer = new KafkaProducer<>(properties);
for (int i = 0; i < 100000; i++) {
Future<RecordMetadata> send = kafkaProducer.send(new ProducerRecord<>("topic-name", x + i));
}
kafkaProducer.flush();
kafkaProducer.close();
}
}
通过callback打印exception,发现报了如下错误
org.apache.kafka.common.errors.TimeoutException: Expiring 1250 record(s) for topic-name-0: 35218 ms has passed since last append
org.apache.kafka.common.errors.TimeoutException: Expiring 212 record(s) for topic-name-0: 35218 ms has passed since batch creation plus linger time
现在暂时没时间去查证,后续会更新,当然如果有看到的大神,麻烦评论或私信告知,谢谢