步骤 1: 环境准备
- 安装并配置Kafka: 确保Kafka集群已经搭建并运行正常。
- 安装并配置ClickHouse: 确保ClickHouse服务器已经安装并配置好,可以接受数据写入。
- 安装Flink: 根据需要选择本地或集群模式,安装并配置Flink。
步骤 2: 创建Flink项目
- 创建新的Flink项目: 使用IDE(如IntelliJ IDEA)或命令行创建一个新的Flink项目。
- 添加依赖: 在项目的
pom.xml
或build.sbt
文件中添加Flink、Kafka和ClickHouse的依赖。
步骤 3: 编写Flink作业
- 配置连接: 在Flink作业中配置Kafka和ClickHouse的连接信息,包括Kafka的broker地址、topic、ClickHouse的host、port等。
- 读取Kafka数据: 使用Flink的Kafka Connector读取Kafka中的数据流。可以选择使用DataStream API或Table API。
- 数据处理: 对读取的数据进行所需的处理,如过滤、转换、聚合等。
- 写入ClickHouse: 使用Flink的JDBC Connector或其他适配器将处理后的数据写入ClickHouse。可以选择使用Batch写入或Stream写入,根据数据特性和业务需求调整写入策略。
题目需求:
使用Flink消费Kafka的dwd层数据,监控order_status字段为已退款的数据,将数据存入ClickHouse数据库shtd_result的order_master表中,然后在Linux的ClickHouse命令行中根查询出前5条。
Pom.xml文件
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_${scala.binary.version}</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>flink-connector-redis_2.11</artifactId>
<version>1.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- 指定mysql-connector的依赖 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
<exclusions>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.6</version>
</dependency>
<!--HBase -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hbase-base_2.12</artifactId>
<version>1.14.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hbase-2.2_2.12</artifactId>
<version>1.14.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.2.3</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.12.11</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>2.12.11</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.11</version>
</dependency>
<!--flink-connector-jdbc flink版本需在1.11.0之后-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!--clickhouse jdbc连接-->
<dependency>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.2-patch11</version>
</dependency>
<dependency>
<groupId>ru.yandex.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.1</version>
</dependency>
</dependencies>
1.配置ClickHouse JDBC配置
// ClickHouse JDBC 配置
val jdbcBuild = new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:clickhouse://192.168.45.13/shtd_result")
.withPassword("123456")
.withDriverName("ru.yandex.clickhouse.ClickHouseDriver")
.withUsername("default")
.build()
2.对ClickHouse执行的SQL语句
val sqlStr = "INSERT INTO order_master (" +
" order_id," +
" order_sn," +
" customer_id," +
" shipping_user," +
" province," +
" city," +
" address," +
" order_source," +
" payment_method," +
" order_money," +
" district_money," +
" shipping_money," +
" payment_money," +
" shipping_comp_name," +
" shipping_sn," +
" create_time," +
" shipping_time," +
" pay_time," +
" receive_time," +
" order_status," +
" order_point," +
" invoice_title," +
" modified_time" +
") values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
3.将数据addSink到ClickHouse中
order_Stream.addSink(
JdbcSink.sink(
sqlStr,
// 预编译语句
new JdbcStatementBuilder[Data] {
override def accept(t: PreparedStatement, u: Data): Unit = {
t.setString(1, u.order_id)
t.setString(2, u.order_sn.toString)
t.setLong(3, u.customer_id)
t.setString(4, u.shipping_user)
t.setString(5, u.province)
t.setString(6, u.city)
t.setString(7, u.address)
t.setInt(8, u.order_source)
t.setInt(9, u.payment_method)
t.setDouble(10, u.order_money)
t.setDouble(11, u.district_money)
t.setDouble(12, u.shipping_money)
t.setDouble(13, u.payment_money)
t.setString(14, u.shipping_comp_name)
t.setString(15, u.shipping_sn)
t.setString(16, u.create_time.toString)
t.setString(17, u.shipping_time)
t.setString(18, u.pay_time)
t.setString(19, u.receive_time)
t.setString(20, u.order_status)
t.setInt(21, u.order_point)
t.setString(22, u.invoice_title)
t.setString(23, u.modified_time)
}
},
JdbcExecutionOptions.builder()
.withBatchSize(1000)
.withBatchIntervalMs(200)
// .withMaxRetries(5)
.build(), jdbcBuild))
完整代码:
package pt
import Utilities.{Data, Order_Master_Data}
import com.clickhouse.client.internal.okhttp.Connection
import com.google.gson.Gson
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.configuration.Configuration
import org.apache.flink.connector.jdbc.{JdbcConnectionOptions, JdbcExecutionOptions, JdbcSink, JdbcStatementBuilder}
import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import java.sql.PreparedStatement
import java.util.Properties
import scala.sys.env
object Flink7 {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val properties = new Properties()
properties.setProperty("bootstrap.servers", "192.168.45.13:9092,192.168.45.14:9092,192.168.45.15:9092")
properties.setProperty("group.id", "pt-test7")
val kafkaConsumer = new FlinkKafkaConsumer[String]("fact_order_master", new SimpleStringSchema(), properties)
// ClickHouse JDBC 配置
val jdbcBuild = new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:clickhouse://192.168.45.13/shtd_result")
.withPassword("123456")
.withDriverName("ru.yandex.clickhouse.ClickHouseDriver")
.withUsername("default")
.build()
val sqlStr = "INSERT INTO order_master (" +
" order_id," +
" order_sn," +
" customer_id," +
" shipping_user," +
" province," +
" city," +
" address," +
" order_source," +
" payment_method," +
" order_money," +
" district_money," +
" shipping_money," +
" payment_money," +
" shipping_comp_name," +
" shipping_sn," +
" create_time," +
" shipping_time," +
" pay_time," +
" receive_time," +
" order_status," +
" order_point," +
" invoice_title," +
" modified_time" +
") values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
val order_Stream = env.addSource(kafkaConsumer)
.map(data =>{
val str = data.replaceAll("NULL", "0")
val gson = new Gson()
val gsonStr = gson.fromJson(str, classOf[Data])
gsonStr
})
.filter(data =>{
data.order_status.equals("已退款")
})
order_Stream.addSink(
JdbcSink.sink(
sqlStr,
// 预编译语句
new JdbcStatementBuilder[Data] {
override def accept(t: PreparedStatement, u: Data): Unit = {
t.setString(1, u.order_id)
t.setString(2, u.order_sn.toString)
t.setLong(3, u.customer_id)
t.setString(4, u.shipping_user)
t.setString(5, u.province)
t.setString(6, u.city)
t.setString(7, u.address)
t.setInt(8, u.order_source)
t.setInt(9, u.payment_method)
t.setDouble(10, u.order_money)
t.setDouble(11, u.district_money)
t.setDouble(12, u.shipping_money)
t.setDouble(13, u.payment_money)
t.setString(14, u.shipping_comp_name)
t.setString(15, u.shipping_sn)
t.setString(16, u.create_time.toString)
t.setString(17, u.shipping_time)
t.setString(18, u.pay_time)
t.setString(19, u.receive_time)
t.setString(20, u.order_status)
t.setInt(21, u.order_point)
t.setString(22, u.invoice_title)
t.setString(23, u.modified_time)
}
},
JdbcExecutionOptions.builder()
.withBatchSize(1000)
.withBatchIntervalMs(200)
// .withMaxRetries(5)
.build(), jdbcBuild))
order_Stream.print()
env.execute()
}
}