参考:深入理解Kafka核心设计和实践原理
7、Producer原理:
先上图:
整个生产者客户端是由两个线程 协调运行,这两条下城分别为主线程和sender线程。在主线程中由KafkaProducer创建消息,然后通过可能的拦截器、序列化器、分区器等将消息缓存到消息累加器中,sender线程负责从消息累加器中获取消息并将其发送到Kafka中。
消息累加器主要是用来缓存消息以便sender线程可以批量发送,进而减少网络传输的资源消耗以提升性能。消息累加器可以缓存的大小为32MB。
主线程中发送过来的消息都会被追加到消息累加器中的某个双端队列中,在消息累加器中内部为每个partition都维护一个双端队列,队列的内容就是ProducerBatch。消息写入缓存时,追加到双端队列的尾部,Sender线程读取消息时,从双端队列的头部读取。ProducerBatch中可以包含一个或者多个的ProducerRecord,这样可以减少网络请求的次数以提升整体的吞吐量。
消息在网络中传输是以Byte形式传输的,在发送之前需要先创建一块内存区域来保证对应的消息,通过java.io.ByteBuffer实现消息内存的创建和释放。为了减少重复的创建和释放,在消息累加器中有一个BufferPool,用来实现ByteBuffer的的复用的,其默认的大小为16KB。
Sender从消息累加器中获取消息后,会进一步将原来的<partition,Deque>的形式保存形式转换成<Node,List>形式,其中Node表示Kafka集群的节点。
在转换成<Node,List>形式之后,Sender会进一步封装成<Node,Requst>的形式,将请求发送到各个Node。
生产者数据配置:
acks:指定partition中必须有多少个副本收到这条消息之后Kafka的Producer才能确认这条消息是写入成功的。
acks=1:默认值。发送消息之后,只要leader副本写入成功,就会收到来自服务端的成功响应。(兼容吞吐量和消息的可靠性)
acks=0:发送消息之后不需要等待客户端的任何响应。吞吐量最高,但是消息可靠性不高。
acks=-1或者acks=all:发送消息之后,需要的等待ISR中所有的副本都成功写入消息之后才能收到来自服务端的成功响应消息。
max.requst.size:限制生产者客户端发送的消息的最大量,默认值为1MB。
retries: retries参数用来配置生产者重试的次数,默认值为0,即繁盛异常的时候不进行任何的重试动作。消息在从生产智能和发出到成功写入服务器之前可能大声一些临时性的异常,这种异常往往是可以自定恢复的,生产智能和可以通过配置retries大于0的值,以此通过内部重试来恢复而不是一味的将异常抛给生产者的应用程序。
retries.backoff.ms: 默认值为100,用来设置两次重试之间的时间间隔,避免无效的频繁重试。
compress.type: 默认值为"none",即消息不会被压缩。
receive.buffer.tytes: 设置Socket接收消息缓冲区的大小,默认值为32KB,如果Producer和Kafka在不同的机房,建议适当的调大此值。
send.buffer.bytes: 这个参数时用来设置Socket发送消息缓冲区的大小,默认为128KB。
生产者实例代码:
package com.paojiaojiang.consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.util.Arrays;
import java.util.Properties;
/**
- @Author: jja
- @Description:
- @Date: 2019/3/19 23:45
*/
public class ClassicalConsumer {
public static void main(String[] args) {
Properties props = new Properties();
// 定义kakfa 服务的地址,不需要将所有broker指定上
props.put("bootstrap.servers", "spark:9092,spark1:9092,spark2:9092");
// 制定consumer group
props.put("group.id", "test2");
// 是否自动确认提交offset --也就是消费位移的提交,这个位移提交可能会出现消费丢失或者重复消费的问题
props.put("enable.auto.commit", "true");
// 自动确认offset的时间间隔
props.put("auto.commit.interval.ms", "1000");
// key的序列化类
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// value的序列化类
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// 定义consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
// 消费者订阅的topic, 可同时订阅多个
consumer.subscribe(Arrays.asList("paojiaojiang", "second", "third"));
while (true) {
// 读取数据,读取超时时间为100ms
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
}
}
必要的参数说明:
bootstrap.servers: 用来指定连接Kafka集群所需要的broker的地址清单,并非需要将所有的broker地址都写上,消费者会从现有的配置中查询全部的Kafka集群长成员,但是至少应该配置两个及以上的broker地址,以防其中的某个broker宕机时消费者仍然是可以正常消费的。
key.deserializer/value.deserializer: org.apache.kafka.common.serialization.StringDeserializer,此参数是将Producer序列化后的消息反序列化回来。
订阅topic和partition:
consumer.subscribe(Arrays.asList("paojiaojiang", "second", "third"));
在订阅topic时,可以以集合的方式订阅多个topic,也可以以正则表达式的方式订阅:
方式1:
@Override
public void subscribe(Collection<String> topics, ConsumerRebalanceListener listener) {
acquire();
try {
if (topics == null) {
throw new IllegalArgumentException("Topic collection to subscribe to cannot be null");
} else if (topics.isEmpty()) {
// treat subscribing to empty topic list as the same as unsubscribing
this.unsubscribe();
} else {
for (String topic : topics) {
if (topic == null || topic.trim().isEmpty())
throw new IllegalArgumentException("Topic collection to subscribe to cannot contain null or empty topic");
}
log.debug("Subscribed to topic(s): {}", Utils.join(topics, ", "));
this.subscriptions.subscribe(new HashSet<>(topics), listener);
metadata.setTopics(subscriptions.groupSubscription());
}
} finally {
release();
}
}
方式2:
@Override
public void subscribe(Collection<String> topics) {
subscribe(topics, new NoOpConsumerRebalanceListener());
}
**acquire()**方法; Acquire the light lock protecting this consumer from multi-threaded access. Instead of blocking when the lock is not available, however, we just throw an exception (since multi-threaded usage is not supported).(获取保护此使用者不受多线程访问的锁。当锁不可用时,我们不进行阻塞,而是抛出一个异常(因为不支持多线程使用))
也就是说acquire()不会造成阻塞等待,仅仅通过线程计数标记的方式检测线程是否发生并发操作,以此保证只有一个线程在操作,且acquire()和release()方法时成都出现的,表示响应的加锁和释放锁操作。
acquire()方法源码:
private void acquire() {
ensureNotClosed();
long threadId = Thread.currentThread().getId(); // 获取当前线程的id
if (threadId != currentThread.get() && !currentThread.compareAndSet(NO_CURRENT_THREAD, threadId)) // another thread already has the lock
throw new ConcurrentModificationException("KafkaConsumer is not safe for multi-threaded access");
refcount.incrementAndGet();
}
方式3:
@Override
public void subscribe(Pattern pattern, ConsumerRebalanceListener listener) {
acquire();
try {
if (pattern == null)
throw new IllegalArgumentException("Topic pattern to subscribe to cannot be null");
log.debug("Subscribed to pattern: {}", pattern);
this.subscriptions.subscribe(pattern, listener);
this.metadata.needMetadataForAllTopics(true);
this.coordinator.updatePatternSubscription(metadata.fetch());
this.metadata.requestUpdate();
} finally {
release();
}
}
如果前后订阅了不同的topic,以最后一次为准。
//订阅以paojiaojiang为开头的所有的topic
consumer.subscribe(pattert.compile("paojiaojiang-.*"));
直接订阅某些topic的特定partition:
*/
@Override
public void assign(Collection<TopicPartition> partitions) {
acquire();
try {
if (partitions == null) {
throw new IllegalArgumentException("Topic partition collection to assign to cannot be null");
} else if (partitions.isEmpty()) {
this.unsubscribe();
} else {
Set<String> topics = new HashSet<>();
for (TopicPartition tp : partitions) {
String topic = (tp != null) ? tp.topic() : null;
if (topic == null || topic.trim().isEmpty())
throw new IllegalArgumentException("Topic partitions to assign to cannot have null or empty topic");
topics.add(topic);
}
// make sure the offsets of topic partitions the consumer is unsubscribing from
// are committed since there will be no following rebalance
this.coordinator.maybeAutoCommitOffsetsNow();
log.debug("Subscribed to partition(s): {}", Utils.join(partitions, ", "));
this.subscriptions.assignFromUser(new HashSet<>(partitions));
metadata.setTopics(topics);
}
} finally {
release();
}
}
TopicPartition源码:
/**
- A topic name and partition number
*/
public final class TopicPartition implements Serializable {
private int hash = 0;
private final int partition;
private final String topic;
...
Kafka中的**partitionsFor()**方法可以用来查询指定分区的元数据消息:
源码:
@Override
public List<PartitionInfo> partitionsFor(String topic) {
acquire();
try {
Cluster cluster = this.metadata.fetch();
List<PartitionInfo> parts = cluster.partitionsForTopic(topic);
if (!parts.isEmpty())
return parts;
Map<String, List<PartitionInfo>> topicMetadata = fetcher.getTopicMetadata(
new MetadataRequest.Builder(Collections.singletonList(topic), true), requestTimeoutMs);
return topicMetadata.get(topic);
} finally {
release();
}
}
PartitionInfo类 源码解析:
public class PartitionInfo {
// The topic name
private final String topic;
// The partition id
private final int partition;
// The node id of the node currently acting as a leader for this partition or null if there is no leader
//(当前充当此分区的前导的节点的节点ID;如果没有前导,则为空)
private final Node leader;
// The complete set of replicas for this partition regardless of whether they are alive or up-to-date
//(此分区的完整副本集,不管它们是活动的还是最新的) 即AR集合
private final Node[] replicas;
// The subset of the replicas that are in sync, that is caught-up to the leader and ready to take over as leader if the leader should fail
// (同步的副本的一个子集,该副本与leader副本联系在一起,并准备在leader副本失败时接管领导权)
private final Node[] inSyncReplicas;
...
例子:
// 订阅某个topic的partition d动态获取其中的partition
List<TopicPartition> partitons = new ArrayList<>();
List<PartitionInfo> parttionInfos = consumer.partitionsFor("paojiaojiang-topic");
if (parttionInfos != null) {
// 获取
for (PartitionInfo partionInfo : parttionInfos) {
partitons.add(new TopicPartition(partionInfo.topic(),partionInfo.partition()));
}
}
总结: kafka订阅方式:
集合订阅:subscribe(Collection topics…
正则订阅:subscribe(Pattern pattern…
指定partition的订阅: assign(Collection partitions)…