2020.12.07课堂笔记(使用IDEA创建Kafka生产者和消费者)

最新推荐文章于 2021-07-31 13:23:51 发布

超可爱慕之

最新推荐文章于 2021-07-31 13:23:51 发布

阅读量682

点赞数

分类专栏：笔记

本文链接：https://blog.csdn.net/m0_48758256/article/details/110822651

版权

根据官网文档翻译：
运行 Kafka Connect：Kafka Connect 当前支持两种执行方式: 单机 (单个进程) 和分布式.
在单机模式下所有的工作都是在一个进程中运行的。connect的配置项很容易配置和开始使用，当只有一台机器(worker)的时候也是可用的(例如，收集日志文件到kafka)，但是不利于Kafka Connect 的容错。你可以通过下面的命令启动一个单机进程:

> bin/connect-standalone.sh \
> config/connect-standalone.properties \
> connector1.properties \
> [connector2.properties ...]

第一个参数是 worker 的配置文件. 其中包括 Kafka connection 参数，序列化格式，和如何频繁的提交offsets。所提供的示例可以在本地良好的运行，使用默认提供的配置 config/server.properties 。它需要调整以配合不同的配置或生产环境部署。
所有的workers（独立和分布式）都需要一些配置 :
bootstrap.servers
key.converter
value.converter

bootstrap.servers - List of Kafka servers used to bootstrap connections to Kafka
key.converter - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.
value.converter - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.

单机模式的重要配置如下:

offset.storage.file.filename - 存储 offset 数据的文件

此处配置的参数适用于由Kafka Connect使用的 producer 和 consumer 访问配置，offset 和 status topic。对于 Kafka source和 sink 任务的配置，可以使用相同的参数，但必须以consumer. 和 producer. 作为前缀。此外，从 worker 配置中继承的参数只有一个，就是 bootstrap.servers。大多数情况下，这是足够的，因为同一个集群通常用于所有的场景。但是需要注意的是一个安全集群，需要额外的参数才能允许连接。这些参数需要在 worker 配置中设置三次，一次用于管理访问，一次用于 Kafka sinks，还有一次用于 Kafka source。

其余参数用于 connector 的配置文件，你可以导入尽可能多的配置文件，但是所有的配置文件都将在同一个进程内(在不同的线程上)执行。

分布式模式下会自动进行负载均衡，允许动态的扩缩容，并提供对 active task，以及这个任务对应的配置和offset提交记录的容错。分布式执行方式和单机模式非常相似:

> bin/connect-distributed.sh config/connect-distributed.properties

和单机模式不同在于启动的实现类和决定 Kafka connect 进程如何工作的配置参数，如何分配 work,offsets 存储在哪里和任务状态。在分布式模式中，Kafka Connect 存储 offsets,配置和存储在 Kafka topic中的任务状态。建议手动创建Kafka 的 offsets,配置和状态，以实现自己所期望的分区数和备份因子。如果启动Kafka Connect之前没有创建 topic，则会使用默认分区数和复制因子自动创建创建 topic，但这可能不是最适合的。

特别是，除了上面提到的常用设置之外，以下配置参数在启动集群之前至关重要:
group.id
config.storage.topic
offset.storage.topic
status.storage.topic

group.id (default connect-cluster) - unique name for the cluster, used in forming the Connect cluster group; note that this must not conflict with consumer group IDs
config.storage.topic (default connect-configs) - topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated, compacted topic. You may need to manually create the topic to ensure the correct configuration as auto created topics may have multiple partitions or be automatically configured for deletion rather than compaction
offset.storage.topic (default connect-offsets) - topic to use for storing offsets; this topic should have many partitions, be replicated, and be configured for compaction
status.storage.topic (default connect-status) - topic to use for storing statuses; this topic can have multiple partitions, and should be replicated and configured for compaction

注意在分布式模式下 connector 配置不会通过命令行传递。相反，会使用下面提到的 REST API来创建，修改和销毁 connectors。

使用IDEA创建生产者：

官方文档：3.3 Producer 配置
以下是JAVA生产者的配置：（以下列出重要性为高的）

Name	Description	Type	Default	Valid Values	Importance
bootstrap.servers	这是一个用于建立初始连接到kafka集群的"主机/端口对"配置列表。不论这个参数配置了哪些服务器来初始化连接，客户端都是会均衡地与集群中的所有服务器建立连接。—配置的服务器清单仅用于初始化连接，以便找到集群中的所有服务器。配置格式： host1:port1,host2:port2,… 由于这些主机是用于初始化连接，以获得整个集群（集群是会动态变化的），因此这个配置清单不需要包含整个集群的服务器。（当然，为了避免单节点风险，这个清单最好配置多台主机）。	list			high
key.serializer	关键字的序列化类，实现以下接口： org.apache.kafka.common.serialization.Serializer 接口。	class			high
value.serializer	值的序列化类，实现以下接口： org.apache.kafka.common.serialization.Serializer 接口。	class			high
acks	此配置是 Producer 在确认一个请求发送完成之前需要收到的反馈信息的数量。这个参数是为了保证发送请求的可靠性。以下配置方式是允许的： acks=0 如果设置为0，则 producer 不会等待服务器的反馈。该消息会被立刻添加到 socket buffer 中并认为已经发送完成。在这种情况下，服务器是否收到请求是没法保证的，并且参数retries也不会生效（因为客户端无法获得失败信息）。每个记录返回的 offset 总是被设置为-1。 acks=1 如果设置为1，leader节点会将记录写入本地日志，并且在所有 follower 节点反馈之前就先确认成功。在这种情况下，如果 leader 节点在接收记录之后，并且在 follower 节点复制数据完成之前产生错误，则这条记录会丢失。 acks=all 如果设置为all，这就意味着 leader 节点会等待所有同步中的副本确认之后再确认这条记录是否发送完成。只要至少有一个同步副本存在，记录就不会丢失。这种方式是对请求传递的最有效保证。acks=-1与acks=all是等效的。	string	1	[all, -1, 0, 1]	high
buffer.memory	Producer 用来缓冲等待被发送到服务器的记录的总字节数。如果记录发送的速度比发送到服务器的速度快， Producer 就会阻塞，如果阻塞的时间超过 max.block.ms 配置的时长，则会抛出一个异常。这个配置与 Producer 的可用总内存有一定的对应关系，但并不是完全等价的关系，因为 Producer 的可用内存并不是全部都用来缓存。一些额外的内存可能会用于压缩(如果启用了压缩)，以及维护正在运行的请求。	long	33554432	[0,…]	high
compression.type	Producer 生成数据时可使用的压缩类型。默认值是none(即不压缩)。可配置的压缩类型包括：none, gzip, snappy, 或者 lz4 。压缩是针对批处理的所有数据，所以批处理的效果也会影响压缩比(更多的批处理意味着更好的压缩)。	string	none		high
retries	若设置大于0的值，则客户端会将发送失败的记录重新发送，尽管这些记录有可能是暂时性的错误。请注意，这种 retry 与客户端收到错误信息之后重新发送记录并无区别。允许 retries 并且没有设置max.in.flight.requests.per.connection 为1时，记录的顺序可能会被改变。比如：当两个批次都被发送到同一个 partition ，第一个批次发生错误并发生 retries 而第二个批次已经成功，则第二个批次的记录就会先于第一个批次出现。	int	0	[0,…,2147483647]	high
ssl.key.password	key store 文件中私钥的密码。这对于客户端来说是可选的。	password	null		high
ssl.keystore.location	key store 文件的位置。这对于客户端来说是可选的，可用于客户端的双向身份验证。	string	null		high
ssl.keystore.password	key store 文件的密码。这对于客户端是可选的，只有配置了 ssl.keystore.location 才需要配置该选项。	password	null		high
ssl.truststore.location	trust store 文件的位置。	string	null		high
ssl.truststore.password	trust store 文件的密码。如果一个密码没有设置到 trust store ，这个密码仍然是可用的，但是完整性检查是禁用的。	password	null		high

package nj.zb.kb09;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;


import java.util.Properties;

/**
 * @Author: ChaoKeAiMuZhi
 * @Date: 2020/12/7 15:14
 * @Description:
 **/
public class MyProducer {
   
    public static void main(String[] args) {
   
        Properties prop = new Properties();
        //bootstrap.servers
        prop.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"