本文Kafka源码版本:1.0.0.
一、KafkaProducer介绍
1、KafkaProducer介绍与使用
KafkaProducer是Kafka的客户端、消息的生产者,用来将消息发往Kafka cluster。KafkaProducer是线程安全的,并且单线程使用KafkaProducer实例比多线程性能更高。
使用示例:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
producer.close();
2、KafkaProducer属性
// 生产者id
private final String clientId;
// 内部使用的监控模块,负责管理Sensor对象
final Metrics metrics;
// 对同一个操作需要有多方面度量, 比如请求平均时间、错误数等
private final Sensor errors;
// 分区选择器, 用于消息的路由
private final Partitioner partitioner;
// 消息的最大长度,包括消息头+序列化后key+序列化后value的长度
private final int maxRequestSize;
// 发送单个消息缓冲区大小
private final long totalMemorySize;
// 整个Kafka集群的元数据, 被客户端线程共享
private final Metadata metadata;
// 用于收集待发送的数据
private final RecordAccumulator accumulator;
// 用于发送消息的任务, 在ioThread线程中执行
private final Sender sender;
// 调用Sender消息发送任务
private final Thread ioThread;
// 压缩算法:用于消息压缩
private final CompressionType compressionType;
// 时间相关的工具类
private final Time time;
// key和value的序列化类, 可以在ProducerConfig中自定义序列化类
private final ExtendedSerializer<K> keySerializer;
private final ExtendedSerializer<V> valueSerializer;
// 配置类, 用于初始化KafkaProducer
private final ProducerConfig producerConfig;
// Kafka集群metadata更新的最长时长
private final long maxBlockTimeMs;
// 消息发送到接收ACK响应的最长时长
private final int requestTimeoutMs;
// 拦截器, 用于在消息发送前或者回调时的预处理
private final ProducerInterceptors<K, V> interceptors;
// Node版本,Kafka内部使用
private final ApiVersions apiVersions;
// 事务管理
private final TransactionManager transactionManager;
3、KafkaProducer构造器
我们使用最多的KafkaProducer构造器是:
public KafkaProducer(Properties properties) {
this(new ProducerConfig(properties), null, null);
}
在new ProducerConfig()中会初始化配置:
ProducerConfig(Map<?, ?> props) {
super(CONFIG, props);
}
static {
CONFIG = new ConfigDef().define(BOOTSTRAP_SERVERS_CONFIG, Type.LIST, Importance.HIGH, CommonClientConfigs.BOOTSTRAP_SERVERS_DOC)
.define(BUFFER_MEMORY_CONFIG, Type.LONG, 32 * 1024 * 1024L, atLeast(0L), Importance.HIGH, BUFFER_MEMORY_DOC)
...
}
然后会将我们在Properties设置的参数覆盖默认值.
KafkaProducer在进行构造器初始化的时候会初始化Partitioner、Serializer、拦截器、元数据、消息收集器RecordAccumulator、NetworkClient、Sender等:
private KafkaProducer(ProducerConfig config, Serializer<K> keySerializer, Serializer<V> valueSerializer) {
try {
Map<String, Object> userProvidedConfigs = config.originals();
this.producerConfig = config;
this.time = Time.SYSTEM;
String clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG);
if (clientId.length() <= 0)
clientId = "producer-" + PRODUCER_CLIENT_ID_SEQUENCE.getAndIncrement();
this.clientId = clientId;
Map<String, String> metricTags = Collections.singletonMap("client-id", clientId);
MetricConfig metricConfig = new MetricConfig().samples(config.getInt(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG))
.timeWindow(config.getLong(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG), TimeUnit.MILLISECONDS)
.recordLevel(Sensor.RecordingLevel.forName(config.getString(ProducerConfig.METRICS_RECORDING_LEVEL_CONFIG)))
.tags(metricTags);
List<MetricsReporter> reporters = config.getConfiguredInstances(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG,
MetricsReporter.class);
reporters.add(new JmxReporter(JMX_PREFIX));
this.metrics = new Metrics(metricConfig, reporters, time);
ProducerMetrics metricsRegistry = new ProducerMetrics(this.metrics);
// 通过反射实例化Partitioner类 (可以自定义Partitioner)
this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);
// 通过反射实例化keySerializer、valueSerializer(可以自定义Serializer)
if (keySerializer == null) {
// 实例化keySerializer, extend为我们自定义的Serializer
this.keySerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
Serializer.class));
// 初始化keySerializer
this.keySerializer.configure(config.originals(), true);
} else {
config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
this.keySerializer = ensureExtended(keySerializer);
}
if (valueSerializer == null) {
this.valueSerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
Serializer.class));
this.valueSerializer.configure(config.originals(), false);
} else {
config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
this.valueSerializer = ensureExtended(valueSerializer);
}
// load interceptors and make sure they get clientId
// 拦截器配置
userProvidedConfigs.put(ProducerConfig.CLIENT_ID_CONFIG, clientId);
List<ProducerInterceptor<K, V>> interceptorList = (List) (new ProducerConfig(userProvidedConfigs, false)).getConfiguredInstances(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,
ProducerInterceptor.class);
this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);
// 集群元数据, 并初始化更新
ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keySerializer, valueSerializer, interceptorList, reporters);
this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG),
true, true, clusterResourceListeners);
List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG));
this.metadata.update(Cluster.bootstrap(addresses), Collections.<String>emptySet(), time.milliseconds());
...
// 创建RecordAccumulator用于收集消息
this.accumulator = new RecordAccumulator(logContext,
config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
this.totalMemorySize,
this.compressionType,
config.getLong(ProducerConfig.LINGER_MS_CONFIG),
retryBackoffMs,
metrics,
time,
apiVersions,
transactionManager);
// 创建NetworkClient, 用于网络I/O
NetworkClient client = new NetworkClient(
new Selector(config.getLong(ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG),
this.metrics, time, "producer", channelBuilder, logContext),
this.metadata,
clientId,
maxInflightRequests,
config.getLong(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG),
config.getLong(ProducerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG),
config.getInt(ProducerConfig.SEND_BUFFER_CONFIG),
config.getInt(ProducerConfig.RECEIVE_BUFFER_CONFIG),
this.requestTimeoutMs,
time,
true,
apiVersions,
throttleTimeSensor,
logContext);
// 用于消息发送的任务和线程
this.sender = new Sender(logContext,
client,
this.metadata,
this.accumulator,
maxInflightRequests == 1,
config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG),
acks,
retries,
metricsRegistry.senderMetrics,
Time.SYSTEM,
this.requestTimeoutMs,
config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG),
this.transactionManager,
apiVersions);
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
} catch (Throwable t) {
throw new KafkaException("Failed to construct kafka producer", t);
}
}
二、KafkaProducer消息发送流程
Kafka消息发送的整体流程如下:
Kafka消息发送的步骤:
- 初始化:Kafka初始化,加载默认配置以及设置的配置参数,开启网络线程;
- 执行拦截器逻辑,预处理消息(如果未实现拦截器,就跳过此步骤);
- 获取集群元数据metadata;
- 调用Serializer.serialize()方法序列化消息的key/value;
- 调用partition()选择合适的分区策略,为消息进行分区;
- 将消息缓存到RecordAccumulator中;
- 唤醒Sender线程,将待发送的数据按 【Broker Id <=> List】的数据进行归类
- 与服务端不同的Broker建立网络连接,将对应Broker待发送的消息List发送出去。
三、KafkaProducer 常见概念
在KafkaProducer发送消息过程中,有一些常见类需要讲解,以便更好理解源码的实现过程。
1. Cluster
Kafka Cluster保存节点、topic、分区信息,比如当前 Kafka 集群共有多少主题、多少 Broker 等:
- broker.id 与 node对应关系
- topic与Partition对应关系
- node与Partition对应关系
/**
* A representation of a subset of the nodes, topics, and partitions in the Kafka cluster.
*/
public final class Cluster {
private final boolean isBootstrapConfigured;
// 集群中的node 节点id
private final List<Node> nodes;
// 未认证的topic列表
private final Set<String> unauthorizedTopics;
// 内置的 topic 列表
private final Set<String> internalTopics;
//
private final Node controller;
// partition详细信息
private final Map<TopicPartition, PartitionInfo> partitionsByTopicPartition;
// topic与Partition关系
private final Map<String, List<PartitionInfo>> partitionsByTopic;
// 可用topic于Partition关系
private final Map<String, List<PartitionInfo>> availablePartitionsByTopic;
// node与Partition关系
private final Map<Integer, List<PartitionInfo>> partitionsByNode;
// node与id关系
private final Map<Integer, Node> nodesById;
private final ClusterResource clusterResource;
}
2. Metadata
Metadata保存了所有topic相关的部分数据,会被所有client线程和后台sender线程共享。当请求一个topic的metadata不存在时,会触发metadata的更新过程。
Metadata的主要数据结构为:Cluster对象和其他熟悉,Cluster记录了topic与集群的信息,具体参数如下。
public final class Metadata {
private static final Logger log = LoggerFactory.getLogger(Metadata.class);
public static final long TOPIC_EXPIRY_MS = 5 * 60 * 1000;
private static final long TOPIC_EXPIRY_NEEDS_UPDATE = -1L;
// metadata更新失败时再次更新的最小时间间隔,避免频繁更新
private final long refreshBackoffMs;
// metadata过期时间
private final long metadataExpireMs;
// metadata版本号,每更新一次就自增加1,用于判断metadat是否更新
private int version;
// metadata最近一次更新的时间(包含失败的情况)
private long lastRefreshMs;
// metadata最近一次成功更新的时间
private long lastSuccessfulRefreshMs;
// SASL authentication相关, 如果出现该错误,会停止更新
private AuthenticationException authenticationException;
// topic与集群相关的信息
private Cluster cluster;
// 是否需要更新metadat
private boolean needUpdate;
/* Topics with expiry time */
private final Map<String, Long> topics;
// metadata更新时监听对象
private final List<Listener> listeners;
// 接收metadata更新的请求
private final ClusterResourceListeners clusterResourceListeners;
// 是否强制更新所有topics对应的metadata数据
private boolean needMetadataForAllTopics;
// 如果为true,当更新metadata的时候topic不存在, broker会自动创建
private final boolean allowAutoTopicCreation;
// 构造时默认为true, 代表producer会定时移除过期的topic,consumer不会
private final boolean topicExpiryEnabled;
}
3、Metrics
Metrics是用来统计某些指标的度量。Sensor能够统计某段时间内关联的多个Metric的指标功能,如平均值、最大值、最小值。·
public class Metrics implements Closeable {
private final MetricConfig config;
private final ConcurrentMap<MetricName, KafkaMetric> metrics;
private final ConcurrentMap<String, Sensor> sensors;
private final ConcurrentMap<Sensor, List<Sensor>> childrenSensors;
private final List<MetricsReporter> reporters;
private final Time time;
...
}
4、RecordAccumulator
RecordAccumulator可以理解为固定大小的队列,用来将消息内容保存到内容当中。
public final class RecordAccumulator {
private final Logger log;
private volatile boolean closed;
private final AtomicInteger flushesInProgress;
private final AtomicInteger appendsInProgress;
private final int batchSize;
private final CompressionType compression;
private final long lingerMs;
private final long retryBackoffMs;
private final BufferPool free;
private final Time time;
private final ApiVersions apiVersions;
private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;
private final IncompleteBatches incomplete;
// The following variables are only accessed by the sender thread, so we don't need to protect them.
private final Set<TopicPartition> muted;
private int drainIndex;
private final TransactionManager transactionManager;
}
5、Sender
封装消息发送的处理逻辑。
————————————————
版权声明:本文为CSDN博主「是Guava不是瓜娃」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/noaman_wgs/article/details/105646288