Flink获取Kafka中消息的Offset


虽然Flink消费kafka有着完善的checkpoint机制,可以使得程序停止后再次能从上一次的消费位点继续消费,但是有时候flink的checkpoint也会失败,或者checkpoint管理起来不够灵活,我们想自己维护kafka 的offset信息。
但是Flink封装的FlinkKafkaConsumer并不能直接的获取kafka 消息的offset

前期准备

  1. 环境准备

启动Zookeeper以及Kafka

在这里插入图片描述
查看启动状态
在这里插入图片描述

  1. python Faker制造假数据脚本编写

以下代码使用到了制造假数据的Faker第三方模块需要先使用pip安装faker 内联代码片

pip3 install faker
# -*- coding=utf8 -*-

import time
import random
import json
from pykafka import KafkaClient
from faker import Faker
from tqdm import tqdm
from collections import defaultdict

# broker地址 topic名称
client = KafkaClient(hosts="192.168.77.100:9092,192.168.77.100:9092,192.168.77.100:9092")
topic = client.topics["huawei"]


# faker 工厂类
class FakerFactory():
    def __init__(self):
        self.fk = Faker()

    def create_id(self, id):
        return str(id).encode()

    def create_name(self):
        return self.fk.name().encode()

    def create_profile(self):
        return self.fk.profile()

    def create_age(self):
        return str(random.randint(18, 90)).encode()

    def create_score(self):
        return str(random.randint(1, 100)).encode()

    def create_event_time(self):
        return str(int(round(time.time() * 1000))).encode()

    def creaate_str(self):
        return self.fk.pystr()


class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, bytes):
            return str(obj, encoding='utf-8')
        return json.JSONEncoder.default(self, obj)


if __name__ == '__main__':

    pbar = tqdm(range(100))
    fakerObj = FakerFactory()

    with topic.get_sync_producer() as producer:
        for i in tqdm(range(1200)):
            data_continer = defaultdict(dict)
            id = fakerObj.create_id(i)
            name = fakerObj.create_name()
            age = fakerObj.create_age()
            score = fakerObj.create_score()
            timestamp = fakerObj.create_event_time()

            profile_data = fakerObj.create_profile()
            del profile_data["current_location"]
            del profile_data["birthdate"]
            # print(type(profile_data))
            # print(profile_data)
            message = ",".encode().join([id, name, age, score, timestamp])
            data_continer["description"] = fakerObj.creaate_str()
            data_continer["stu_info"]["id"] = id
            data_continer["stu_info"]["name"] = name
            data_continer["stu_info"]["age"] = age
            data_continer["stu_info"]["score"] = score
            data_continer["stu_info"]["timestamp"] = timestamp
            data_continer["profile_data"] = profile_data

            # data_continer = data_continer.decode()

            json_massage = json.dumps(data_continer, sort_keys=True, separators=(',', ':'), cls=MyEncoder)
            print("===>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n", json_massage)
            producer.produce(json_massage.encode())
            time.sleep(1)
            data_continer.clear()

Json数据展示 内联代码片

{
	"description": "fiDuNUBJvGFKvscLMxQP",
	"profile_data": {
		"address": "385 Patrick Ranch\nEast Parker, MS 22995",
		"blood_group": "A-",
		"company": "Mckinney-Curtis",
		"job": "Medical secretary",
		"mail": "christinamurphy@hotmail.com",
		"name": "Philip Michael",
		"residence": "2602 Benson Points Apt. 825\nTammychester, CA 52288",
		"sex": "M",
		"ssn": "792-83-9640",
		"username": "benjaminoliver",
		"website": ["https://savage-perez.com/", "https://www.moore-hampton.com/"]
	},
	"stu_info": {
		"age": "66",
		"id": "164",
		"name": "Cristina Avery",
		"score": "12",
		"timestamp": "1673616141785"
	}
}
  1. 数据测试

在这里插入图片描述

在这里插入图片描述
可以正常消费到

根据Kafka获取的报文进行消费 直接上代码:
代码逻辑如下
pom + code

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.cmbc</groupId>
    <artifactId>flink-course</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>flink-course</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.13.0</flink.version>
        <scala.version>2.11.12</scala.version>
        <spark.version>3.2.0</spark.version>
        <hadoop.version>3.3.1</hadoop.version>
        <fastjson.version>2.0.15</fastjson.version>
        <lambok.version>1.18.20</lambok.version>
        <jedis.version>3.0.0</jedis.version>
        <hive.version>3.1.2</hive.version>
        <mysql-connector.version>8.0.21</mysql-connector.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-jdbc_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-json</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>com.ververica</groupId>
            <artifactId>flink-connector-mysql-cdc</artifactId>
            <version>2.1.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql-connector.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.bahir</groupId>
            <artifactId>flink-connector-redis_2.11</artifactId>
            <version>1.0</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.ververica</groupId>
            <artifactId>flink-format-changelog-json</artifactId>
            <version>1.4.0</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>${fastjson.version}</version>
        </dependency>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>${jedis.version}</version>
        </dependency>
        <dependency>
            <groupId>com.github.javafaker</groupId>
            <artifactId>javafaker</artifactId>
            <version>0.17.2</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>${lambok.version}</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.25</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.25</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-to-slf4j</artifactId>
            <version>2.14.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.2.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-spark</artifactId>
            <version>5.0.0-HBase-2.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.glassfish</groupId>
                    <artifactId>javax.el</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

<!-- sparkStreaming -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.3.3</version>
        </dependency>

        <dependency>
            <groupId>org.scalikejdbc</groupId>
            <artifactId>scalikejdbc_2.11</artifactId>
            <version>3.1.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.scalikejdbc/scalikejdbc-config -->
        <dependency>
            <groupId>org.scalikejdbc</groupId>
            <artifactId>scalikejdbc-config_2.11</artifactId>
            <version>3.1.0</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.28</version>
        </dependency>
        <!-- 引入该jar 即可-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-auth</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <!-- hadoop3.x以上版本必须要加以下3行 否则报错需要java9环境 -->
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.jetbrains</groupId>
            <artifactId>annotations</artifactId>
            <version>17.0.0</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>net.sf.json-lib</groupId>
            <artifactId>json-lib</artifactId>
            <version>2.4</version>
            <classifier>jdk15</classifier>
        </dependency>
        <dependency>
            <groupId>io.spray</groupId>
            <artifactId>spray-json_2.11</artifactId>
            <version>1.3.5</version>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

import com.alibaba.fastjson2.{JSON, JSONArray, JSONObject}
import org.apache.flink.configuration.Configuration
import org.apache.flink.runtime.state.filesystem.FsStateBackend
import org.apache.flink.streaming.api.{CheckpointingMode, TimeCharacteristic}
import org.apache.flink.streaming.api.environment.CheckpointConfig
import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import org.apache.log4j.{Level, Logger}

import java.util.Properties
import java.sql.{Connection, DriverManager, PreparedStatement}


// 测试样例类
case class PersonInfo(address: String,
                      job: String,
                      site: String,
                      name: String,
                      age: Int,
                      description: String,
                      eventime: Long)


object ParseJson {
  Logger.getLogger("org").setLevel(Level.WARN)

  def main(args: Array[String]): Unit = {

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    /**
     * //默认checkpoint功能是disabled的,想要使用的时候需要先启用
     * // 每隔5000 ms进行启动一个检查点【设置checkpoint的周期】
     * env.enableCheckpointing(5000L)
     * // 高级:
     * // 设置模式为exactly-once(默认)
     * env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
     * // 确保检查点之间有至少500ms的间隔【checkpoint最小间隔】
     * env.getCheckpointConfig.setMinPauseBetweenCheckpoints(1000L)
     * // 检查点必须在一分钟内完成,或者被丢弃【checkpoint的超时时间】
     * env.getCheckpointConfig.setCheckpointTimeout(60000L)
     *
     * // 同一时间只允许进行一个检查点
     * env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
     * // 表示一旦Flink处理程序被cancel后,会保留Checkpoint数据,以便根据实际需要恢复到指定的Checkpoint【详细解释见备注】
     *
     *
     * //         * ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION:表示一旦Flink处理程序被cancel后,会保留Checkpoint数据,以便根据实际需要恢复到指定的Checkpoint
     * //         * ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION: 表示一旦Flink处理程序被cancel后,会删除Checkpoint数据,只有job执行失败的时候才会保存checkpoint
     * //
     * env.getCheckpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
     *
     * //设置statebackend状态检查点
     * env.setStateBackend(new FsStateBackend("hdfs://node01:8020/kafka2mysql/checkpoints", true)) //异步快照
     *
     *
     * //语义类型 EvenTime
     * env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) */

    // 配置kafka
    val topic = "huawei"
    val prop = new Properties()
    prop.setProperty("bootstrap.servers", "node01:9092,node02:9092,node03:9092")
    prop.setProperty("group.id", "flink")
    prop.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
    prop.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
    prop.setProperty("auto.offset.reset", "latest")




    //构建FlinkKafkaConsumer对象
    val kafkaConsumer = new FlinkKafkaConsumer[String](
      topic,
      new MyKafkaDeserializationSchema(), // 自定义实现kafka的消息反序列化,从重写的deserialize方法中获取到 Consumer对象的Offset以及Partition信息,下侧代码块【MyKafkaDeserializationSchema.scala】具体展示
      prop)

    // 接收topic数据
    val kafkaSource: DataStream[String] = env.addSource(kafkaConsumer)

    val personInfo: DataStream[PersonInfo] = kafkaSource.map(data => {
      val jsonData: JSONObject = JSON.parseObject(data)
      println(jsonData)
      val description: String = jsonData.getString("description")
      val stuInfo: JSONObject = jsonData.getJSONObject("stu_info")
      val profile_data = jsonData.getJSONObject("profile_data")
      val address: String = profile_data.getString("address")
      val name: String = profile_data.getString("name")
      val job: String = profile_data.getString("job")
      val age: Int = stuInfo.getString("age").toInt
      val timestamp: Long = stuInfo.getString("timestamp").toLong
      val website: JSONArray = profile_data.getJSONArray("website")
      val dataStr = if (website == null) {
        ""
      } else {
        (0 until website.size()).map(website.getString).mkString(",")
      }
      // 整合成样例类对象
      PersonInfo(address = address, job = job, site = dataStr, name = name, age = age, description = description, eventime = timestamp)
    })


    personInfo.addSink(new MyKafka2DB2)
    env.execute()
  }
}

// 自定义sink类: MyKafka2DB2
class MyKafka2DB2 extends RichSinkFunction[PersonInfo] {


  var conn: Connection = _
  //插入语句
  var insertTmp: PreparedStatement = _
  var updateTmp: PreparedStatement = _

  override def open(parameters: Configuration): Unit = {
    Class.forName("com.mysql.jdbc.Driver")
    //实现连接配置
    conn = DriverManager.getConnection(
      "jdbc:mysql://node03:3306/testDB?useUnicode=true&characterEncoding=UTF-8&useSSL=false",
      "root",
      "123456")

    val insert_sql = "insert into personInfo values(?,?,?,?,?,?,?);"
    val update_sql = "update personInfo set address = ? where name = ?;"
    //实现插入,invoke中具体插入的数据内容
    insertTmp = conn.prepareStatement(insert_sql)
    updateTmp = conn.prepareStatement(update_sql)
    conn.setAutoCommit(false)
  }

  override def invoke(value: PersonInfo, context: SinkFunction.Context): Unit = {


    // 上来就更新,如果 更新条数为0 就插入
    updateTmp.setString(1, value.name)
    updateTmp.setString(2, value.address)


    updateTmp.executeUpdate()


    if (updateTmp.getUpdateCount == 0) {
      try {
        insertTmp.setString(1, value.address)
        insertTmp.setString(2, value.job)
        insertTmp.setString(3, value.site)
        insertTmp.setString(4, value.name)
        insertTmp.setInt(5, value.age)
        insertTmp.setString(6, value.description)
        insertTmp.setLong(7, value.eventime)
        insertTmp.executeUpdate()
        conn.commit()
      } catch {
        case ex: Exception => ex.printStackTrace()
      }
    }
  }

  override def close(): Unit = {
    if (updateTmp != null) {
      updateTmp.close()
    }
    if (insertTmp != null) {
      insertTmp.close()
    }
    if (conn != null) {
      conn.close()
    }
  }
}

特别注意:自定义MyKafkaDeserializationSchema,实现KafkaDeserializationSchema接口即可:
这里因为我的kafka消息已经是json串了,所以我把消息的offset 和 partition 信息直接插入到json里了。
如果 kafka中消息不是json串,那就可以自己组织数据结构,将 offset 和 partition 信息 插入到value信息中。

import com.alibaba.fastjson2.JSONObject
import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation}
import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema
import org.apache.kafka.clients.consumer.ConsumerRecord


class MyKafkaDeserializationSchema extends KafkaDeserializationSchema[String] {

  override def isEndOfStream(nextElement: String): Boolean = {
    false
  }

  override def deserialize(record: ConsumerRecord[Array[Byte], Array[Byte]]): String = {
    var value = ""
    if (record == null || record.value() == null) "" else value = new String(record.value())
    val offset = record.offset.toString
    val partition = record.partition.toString
    val jsonObject = JSONObject.parseObject(value)
    jsonObject.put("partition", partition)
    jsonObject.put("offset", offset)
    jsonObject.toString /* 原来是json数据类型还返回json */
  }

  /*  数据类型信息参照SimpleStringSchema源码 */
  override def getProducedType: TypeInformation[String] = {
    BasicTypeInfo.STRING_TYPE_INFO
  }
}

功能展示

在这里插入图片描述
运行代码可以看到原来的json串中加入了partiton的信息和offset的信息 至此大功告成!

tips:后续如果想更灵活的不使用Flink自带的CheckPoint机制可以将获取到的Partiton信息以及Offset信息 写入到redis中,每次消费都去通过redis访问到上次消费的Partiton和Offset以便每次都可以准确的消费到上次消费未完成的位置,非常灵活

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
好的,下面是一个简单的 Flink 设置 Kafka Offset 的示例代码: ```java import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.CheckpointingMode; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.source.SourceFunction; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09; import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema; import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema; import org.apache.flink.streaming.connectors.kafka.KafkaSink; import org.apache.flink.streaming.connectors.kafka.config.StartupMode; import org.apache.flink.streaming.util.serialization.JSONKeyValueDeserializationSchema; import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema; import org.apache.flink.util.Collector; import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.common.serialization.StringDeserializer; import org.apache.kafka.common.serialization.StringSerializer; import java.util.Properties; public class FlinkKafkaOffsetDemo { public static void main(String[] args) throws Exception { // 从参数读取 Kafka topic 和其他参数 final ParameterTool parameterTool = ParameterTool.fromArgs(args); String topic = parameterTool.get("topic"); String brokers = parameterTool.get("brokers"); String groupId = parameterTool.get("group-id"); String offsetReset = parameterTool.get("offset-reset", "latest"); // 设置 Flink 的执行环境 final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(5000L, CheckpointingMode.EXACTLY_ONCE); // 设置 Kafka Consumer 的配置 Properties kafkaProps = new Properties(); kafkaProps.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); kafkaProps.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId); kafkaProps.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); kafkaProps.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); kafkaProps.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, offsetReset); // 从 Kafka 读取数据 FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(topic, new SimpleStringSchema(), kafkaProps); kafkaConsumer.setStartFromEarliest(); DataStream<String> input = env.addSource(kafkaConsumer); // 对数据进行处理 DataStream<String> result = input.flatMap(new FlatMapFunction<String, String>() { @Override public void flatMap(String value, Collector<String> out) throws Exception { out.collect(value); } }); // 将数据写入 Kafka Properties producerProps = new Properties(); producerProps.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); producerProps.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); producerProps.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); KafkaSerializationSchema<String> kafkaSerializationSchema = new KeyedSerializationSchema<String>() { @Override public byte[] serializeKey(String element) { return null; } @Override public byte[] serializeValue(String element) { return element.getBytes(); } @Override public String getTargetTopic(String element) { return topic; } }; KafkaSink<String> kafkaSink = new KafkaSink<>(producerProps, kafkaSerializationSchema); result.addSink(kafkaSink); // 执行 Flink Job env.execute("Flink Kafka Offset Demo"); } } ``` 在上面的示例,我们使用 FlinkKafkaConsumer 设置了 Kafka Consumer 的配置,并从 Kafka 读取了数据。在从 Kafka 读取数据的过程,我们可以通过设置 `setStartFromEarliest()` 或 `setStartFromLatest()` 方法来设置从什么位置开始读取数据。 读取到的数据会经过我们自定义的 `flatMap()` 函数进行处理,然后再将处理后的数据写入 Kafka 。在写入数据时,我们使用了 KafkaSink,并设置了 Kafka Producer 的配置和序列化方式。 在实际使用时,我们可以根据具体的业务场景来设置 Kafka Consumer 的 offset,以实现更加灵活的数据处理。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

愚者oO

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值