kafka0.11学习总结
一、product
核心参数
1.常用参数
props.put("bootstrap.servers", "192.168.3.106:9092");
/**
* acks = 0: 表示produce请求立即返回,不需要等待leader的任何确认。
* 这种方案有最高的吞吐率,但是不保证消息是否真的发送成功。
*
* acks = 1: 表示leader副本必须应答此produce请求并写入消息到本地日志,之后produce请求被认为成功. 如果leader挂掉有数据丢失的风险
* acks = -1或者all: 表示分区leader必须等待消息被成功写入到所有的ISR副本(同步副本)中才认为produce请求成功。
* 这种方案提供最高的消息持久性保证,但是理论上吞吐率也是最差的。
*/
props.put("acks","all");
props.put("min.insync.replicas", 2); //配合acks=all使用 此配置指定了成功写入的副本应答的最小数 默认值为1
//重试的次数
props.put("retries", 3);
//配合retries使用,为1保证提交的数据顺序性
props.put("max.in.flight.requests.per.connection",1);
//批处理数据的大小,每次写入多少数据到topic
props.put("batch.size", 16384);
//可以延长多久发送数据
props.put("linger.ms", 1);
props.put("max.request.size", 1048576); //单条消息默认最大长度1M
//缓冲区的大小
props.put("buffer.memory", 33554432);
//指定key和value的序列化器
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
//添加自定义分区函数
//props.put("partitioner.class","com.kaikeba.partitioner.MyPartitioner");
回调函数样例代码
public class KafkaProduct {
public static void main(String[] args) {
Connection conn = null;
PreparedStatement prst = null;
ResultSet rs = null;
Properties prop = new Properties();
prop.put("bootstrap.servers", "docker01:9092,docker02:9092,docker03:9092");
prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
prop.put("acks", "all");
//prop.put("min.insync.replicas", 1); //配合acks=all使用 此配置指定了成功写入的副本应答的最小数
prop.put("retries", 3);
//prop.put("max.in.flight.requests.per.connection", 1) ; //重试机制 值为1 保证批次顺序
prop.put("batch.size", 32768); //批次发送大小限制 32KB
prop.put("linger.ms", 5); //批次发送时间限制
prop.put("max.request.size", 1048576); //单条消息默认最大长度1M
prop.put("buffer.memory", 33554432); //缓冲区大小 默认32M
Producer<String, String> producer = new KafkaProducer<>(prop);
try {
//创建MySQL链接
conn = JdbcUtils.getMysqlConn("jdbc:mysql://localhost:3306/inceptor?useSSL=false", "root", "root");
prst = conn.prepareStatement("SELECT * FROM t01_std_dwd_jbxx_djrkxx_info");
rs = prst.executeQuery();
long start_time = System.currentTimeMillis();
while (rs.next()){
//记录行数据转换成josn格式
String register = JSONObject.toJSONString(new Register(rs));
//回调函数使用
producer.send(new ProducerRecord<>("register1", register), new Callback() {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if ( e == null){
log.info("{}", "数据发送成功");
} else {
log.error("数据发送失败{}", register);
}
}
});
}
long end_time = System.currentTimeMillis();
log.info(end_time - start_time + "");
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
conn.close();
prst.close();
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
producer.close();
}
}
}
幂等与事务(了解)
二、 broker
- 具体参数可查官网文档
名词ISR HW LEO EPOCH
ISR
光是依靠多副本机制能保证Kafka的高可用性,但是能保证数据不丢失吗?
不行,因为如果leader宕机,但是leader的数据还没同步到follower上去,此时即使选举了follower作为新的leader,当时刚才的数据已经丢失了。
ISR是:in-sync replica,就是跟leader partition保持同步的follower partition的数量,只有处于ISR列表中的follower才可以在leader宕机之后被选举为新的leader,因为在这个ISR列表里代表他的数据跟leader是同步的。
如果要保证写入kafka的数据不丢失,首先需要保证ISR中至少有一个follower,其次就是在一条数据写入了leader partition之后,要求必须复制给ISR中所有的follower partition,才能说代表这条数据已提交,绝对不会丢失,这是Kafka给出的承诺
HW&LEO原理
LEO
last end offset,日志末端偏移量,标识当前日志文件中下一条待写入的消息的offset。举一个例子,若LEO=10,那么表示在该副本日志上已经保存了10条消息,位移范围是[0,9]。
HW
Highwatermark,俗称高水位,它标识了一个特定的消息偏移量(offset),消费者只能拉取到这个offset之前的消息。任何一个副本对象的HW值一定不大于其LEO值。
小于或等于HW值的所有消息被认为是“已提交的”或“已备份的”。HW它的作用主要是用来判断副本的备份进度.
下图表示一个日志文件,这个日志文件中只有9条消息,第一条消息的offset(LogStartOffset)为0,最有一条消息的offset为8,offset为9的消息使用虚线表示的,代表下一条待写入的消息。日志文件的 HW 为6,表示消费者只能拉取offset在 0 到 5 之间的消息,offset为6的消息对消费者而言是不可见的。
leader持有的HW即为分区的HW,同时leader所在broker还保存了所有follower副本的leo
(1)关系:leader的leo >= follower的leo >= leader保存的follower的leo >= leader的hw >= follower的hw
(2)原理:上面关系反应出各个值的更新逻辑的先后
保持数据一致性原理(丢数据)
列举两种情况,可查看其他博客
- leader挂了,选一个follower作为新的leader,未同步最新数据,以新的leader的LEO去同步其他follower,超过新的leader的LEO做截断
- ledaer挂了,当新的leader同步LEO之前,此时新进数据,导致leo一样,数据不一致,可以了解下epoch
kafka为什么快
- 顺序写
- 零拷贝
- 文件名称索引与稀疏索引
- 利用page cache 减少使用jvm,避免gc
文件存储机制
三、consumer
- 具体参数可查官网文档
核心参数
1.常用参数
props.put("bootstrap.servers", "192.168.3.106:9092");
//props.put("client.id", "consumer-root1-2");
props.put("group.id", "root6");
//props.put("auto.offset.reset", "earliest"); //earliest如果没有历史偏移量则从头开始消费 latest没有历史偏移量则从最新一条开始消费
props.put("enable.auto.commit", "true"); //后台自动提交
props.put("auto.commit.interval.ms", "1000"); //提交频率
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
消费者高级API
1.自动提交代码
//todo:需求:开发kafka消费者代码(自动提交偏移量)
public class KafkaConsumerStudy {
public static void main(String[] args) {
//准备配置属性
Properties props = new Properties();
//kafka集群地址
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
//消费者组id
props.put("group.id", "consumer-test");
//自动提交偏移量
props.put("enable.auto.commit", "true");
//自动提交偏移量的时间间隔
props.put("auto.commit.interval.ms", "1000");
//默认是latest
//earliest: 当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,从头开始消费
//latest: 当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,消费新产生的该分区下的数据
//none : topic各分区都存在已提交的offset时,从offset后开始消费;只要有一个分区不存在已提交的offset,则抛出异常
props.put("auto.offset.reset","earliest");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
//指定消费哪些topic
consumer.subscribe(Arrays.asList("test"));
while (true) {
//不断的拉取数据
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
//该消息所在的分区号
int partition = record.partition();
//该消息对应的key
String key = record.key();
//该消息对应的偏移量
long offset = record.offset();
//该消息内容本身
String value = record.value();
System.out.println("partition:"+partition+"\t key:"+key+"\toffset:"+offset+"\tvalue:"+value);
}
}
}
}
2.手动提交代码
//todo:需求:开发kafka消费者代码(手动提交偏移量)
public class KafkaConsumerControllerOffset {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
props.put("group.id", "controllerOffset");
//关闭自动提交,改为手动提交偏移量
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
//指定消费者要消费的topic
consumer.subscribe(Arrays.asList("test"));
//定义一个数字,表示消息达到多少后手动提交偏移量
final int minBatchSize = 20;
//定义一个数组,缓冲一批数据
List<ConsumerRecord<String, String>> buffer = new ArrayList<ConsumerRecord<String, String>>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
//insertIntoDb(buffer); 拿到数据之后,进行消费
System.out.println("缓冲区的数据条数:"+buffer.size());
System.out.println("我已经处理完这一批数据了...");
//同步提交 还有异步提交,可以了解下
consumer.commitSync();
buffer.clear();
}
}
}
}
消费者低级API(官方代码)
package com.bigdata.kafkademo.consumer;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import kafka.api.FetchRequest;
import kafka.api.FetchRequestBuilder;
import kafka.api.PartitionOffsetRequestInfo;
import kafka.cluster.BrokerEndPoint;
import kafka.common.ErrorMapping;
import kafka.common.TopicAndPartition;
import kafka.javaapi.FetchResponse;
import kafka.javaapi.OffsetResponse;
import kafka.javaapi.PartitionMetadata;
import kafka.javaapi.TopicMetadata;
import kafka.javaapi.TopicMetadataRequest;
import kafka.javaapi.consumer.SimpleConsumer;
import kafka.message.MessageAndOffset;
public class SimpleExample {
private List<String> m_replicaBrokers = new ArrayList<>();
public SimpleExample() {
m_replicaBrokers = new ArrayList<>();
}
public static void main(String args[]) {
SimpleExample example = new SimpleExample();
// 最大读取消息数量
long maxReads = Long.parseLong("3");
// 要订阅的topic
String topic = "test1";
// 要查找的分区
int partition = Integer.parseInt("0");
// broker节点的ip
List<String> seeds = new ArrayList<>();
seeds.add("192.168.9.102");
seeds.add("192.168.9.103");
seeds.add("192.168.9.104");
// 端口
int port = Integer.parseInt("9092");
try {
example.run(maxReads, topic, partition, seeds, port);
} catch (Exception e) {
System.out.println("Oops:" + e);
e.printStackTrace();
}
}
public void run(long a_maxReads, String a_topic, int a_partition, List<String> a_seedBrokers, int a_port) throws Exception {
// 获取指定Topic partition的元数据
PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, a_topic, a_partition);
if (metadata == null) {
System.out.println("Can't find metadata for Topic and Partition. Exiting");
return;
}
if (metadata.leader() == null) {
System.out.println("Can't find Leader for Topic and Partition. Exiting");
return;
}
String leadBroker = metadata.leader().host();
String clientName = "Client_" + a_topic + "_" + a_partition;
SimpleConsumer consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
long readOffset = getLastOffset(consumer, a_topic, a_partition, kafka.api.OffsetRequest.EarliestTime(), clientName);
int numErrors = 0;
while (a_maxReads > 0) {
if (consumer == null) {
consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
}
FetchRequest req = new FetchRequestBuilder().clientId(clientName).addFetch(a_topic, a_partition, readOffset, 100000).build();
FetchResponse fetchResponse = consumer.fetch(req);
if (fetchResponse.hasError()) {
numErrors++;
// Something went wrong!
short code = fetchResponse.errorCode(a_topic, a_partition);
System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code);
if (numErrors > 5)
break;
if (code == ErrorMapping.OffsetOutOfRangeCode()) {
// We asked for an invalid offset. For simple case ask for
// the last element to reset
readOffset = getLastOffset(consumer, a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);
continue;
}
consumer.close();
consumer = null;
leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);
continue;
}
numErrors = 0;
long numRead = 0;
for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {
long currentOffset = messageAndOffset.offset();
if (currentOffset < readOffset) {
System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffset);
continue;
}
readOffset = messageAndOffset.nextOffset();
ByteBuffer payload = messageAndOffset.message().payload();
byte[] bytes = new byte[payload.limit()];
payload.get(bytes);
System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));
numRead++;
a_maxReads--;
}
if (numRead == 0) {
try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
}
}
}
if (consumer != null)
consumer.close();
}
public static long getLastOffset(SimpleConsumer consumer, String topic, int partition, long whichTime, String clientName) {
TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName);
OffsetResponse response = consumer.getOffsetsBefore(request);
if (response.hasError()) {
System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition));
return 0;
}
long[] offsets = response.offsets(topic, partition);
return offsets[0];
}
private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {
for (int i = 0; i < 3; i++) {
boolean goToSleep = false;
PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
if (metadata == null) {
goToSleep = true;
} else if (metadata.leader() == null) {
goToSleep = true;
} else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {
// first time through if the leader hasn't changed give
// ZooKeeper a second to recover
// second time, assume the broker did recover before failover,
// or it was a non-Broker issue
//
goToSleep = true;
} else {
return metadata.leader().host();
}
if (goToSleep) {
Thread.sleep(1000);
}
}
System.out.println("Unable to find new leader after Broker failure. Exiting");
throw new Exception("Unable to find new leader after Broker failure. Exiting");
}
private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {
PartitionMetadata returnMetaData = null;
loop:
for (String seed : a_seedBrokers) {
SimpleConsumer consumer = null;
try {
consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");
List<String> topics = Collections.singletonList(a_topic);
TopicMetadataRequest req = new TopicMetadataRequest(topics);
kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);
List<TopicMetadata> metaData = resp.topicsMetadata();
for (TopicMetadata item : metaData) {
for (PartitionMetadata part : item.partitionsMetadata()) {
if (part.partitionId() == a_partition) {
returnMetaData = part;
break loop;
}
}
}
} catch (Exception e) {
System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic + ", " + a_partition + "] Reason: " + e);
} finally {
if (consumer != null)
consumer.close();
}
}
if (returnMetaData != null) {
m_replicaBrokers.clear();
for (BrokerEndPoint replica : returnMetaData.replicas()) {
m_replicaBrokers.add(replica.host());
}
}
return returnMetaData;
}
}
消费者组rebalance
1.三种rebalance策略
比如我们消费的一个topic主题有12个分区:p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11
假设我们的消费者组里面有三个消费者。
- range范围策略 RangeAssignor(默认策略)
range策略就是按照partiton的序号范围
p0~3 consumer1
p4~7 consumer2
p8~11 consumer3
默认就是这个策略
- round-robin轮训策略 RoundRobinAssignor
consumer1: 0,3,6,9
consumer2: 1,4,7,10
consumer3: 2,5,8,11
但是前面的这两个方案有个问题:
假设consuemr1挂了:p0-5分配给consumer2,p6-11分配给consumer3
这样的话,原本在consumer2上的的p6,p7分区就被分配到了 consumer3上
- sticky黏性策略 StickyAssignor
最新的一个sticky策略,就是说尽可能保证在rebalance的时候,让原本属于这个consumer
的分区还是属于他们,然后把多余的分区再均匀分配过去,这样尽可能维持原来的分区分配的策略
consumer1: 0-3
consumer2: 4-7
consumer3: 8-11
假设consumer3挂了
consumer1:0-3,+8,9
consumer2: 4-7,+10,11
- 消费者分配策略
//设置消费者分配策略:
properties.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
StickyAssignor.class.getName());
多线程消费(其中一种)代码
package cn.edu360.kafka;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.message.MessageAndMetadata;
public class ConsumerDemo {
private static final String topic = "xiaoniu";
private static final Integer threads = 2;
public static void main(String[] args) {
Properties props = new Properties();
props.put("zookeeper.connect", "node-1:2181,node-2:2181,node-3:2181");
props.put("group.id", "vvvvv");
//smallest重最开始消费,largest代表重消费者启动后产生的数据才消费
//--from-beginning
props.put("auto.offset.reset", "smallest");
ConsumerConfig config = new ConsumerConfig(props);
ConsumerConnector consumer =Consumer.createJavaConsumerConnector(config);
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, threads);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
for(final KafkaStream<byte[], byte[]> kafkaStream : streams){
new Thread(new Runnable() {
public void run() {
for(MessageAndMetadata<byte[], byte[]> mm : kafkaStream){
String msg = new String(mm.message());
System.out.println(msg);
}
}
}).start();
}
}
}
四、压测和性能调优
五、监控软件kafka-eagle
- 下载Kafka Eagle安装包
http://download.smartloli.org/
kafka-eagle-bin-1.2.3.tar.gz - 解压
- tar -zxvf kafka-eagle-bin-1.2.3.tar.gz -C /kkb/install
- 解压之后进入到kafka-eagle-bin-1.2.3目录中
- 得到kafka-eagle-web-1.2.3-bin.tar.gz
- 然后解压 tar -zxvf kafka-eagle-web-1.2.3-bin.tar.gz
- 重命名 mv kafka-eagle-web-1.2.3 kafka-eagle-web
- 修改配置文件
进入到conf目录 修改system-config.properties
# 填上你的kafka集群信息
kafka.eagle.zk.cluster.alias=cluster1
cluster1.zk.list=node01:2181,node02:2181,node03:2181
# kafka eagle页面访问端口
kafka.eagle.webui.port=8048
# kafka sasl authenticate
kafka.eagle.sasl.enable=false
kafka.eagle.sasl.protocol=SASL_PLAINTEXT
kafka.eagle.sasl.mechanism=PLAIN
kafka.eagle.sasl.client=/kkb/install/kafka-eagle-bin-1.2.3/kafka-eagle-web/conf/kafka_client_jaas.conf
# 添加刚刚导入的ke数据库配置,我这里使用的是mysql
kafka.eagle.driver=com.mysql.jdbc.Driver
kafka.eagle.url=jdbc:mysql://node03:3306/ke?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
kafka.eagle.username=root
kafka.eagle.password=123456
- 配置环境变量
vi /etc/profile
export KE_HOME=/kkb/install/kafka-eagle-bin-1.2.3/kafka-eagle-web
export PATH=$PATH:$KE_HOME/bin
- 启动kafka-eagle
sh ke.sh start
- 启动成功后在浏览器中输入http://node01:8048/ke就可以访问kafka eagle 了
用户名:admin
password:123456