Flink_SQL

在这里插入图片描述
DataFrame :数据带属性 保存Scahema.优化RDD。提升性能
DataSet: 进一步对DataFrame进行优化。是DataFrameAPI的一个扩展。强类型DataSet[Car]
DataFrame是DataSet的特例。 DataFrame=DataSet[Row]

DSL :DataFrame提供了一个特定领域的语言。去管理结构化的数据。可以使用Scala java python 和R中使用DSL.使用DSL风格不必去创建临时视图了。将DataFrame当做数据源。不用再创建view了。
val df3(:DataFrame) = spark.read.json(“user.json”)
df3.printSchema
df3.select(“age”).show
df3.select($“age”+1).show //要用引用 ====df3.select('age+1).show
df3.filter('age>27).show
RDD装换为DataFrame.

一:Table API 是一套内嵌在 Java 和 Scala 语言中的查询API,它允许以非常直观的方式组合来自一些关系运算符的查询.
二:Flink 的 SQL 支持基于实现了 SQL 标准的 Apache Calcite。
三:Maven: planner 计划器 生成执行计划。

org.apache.flink
flink-table-planner_2.12
1.10.1


org.apache.flink
flink-table-api-scala-bridge_2.12
1.10.1


org.apache.flink
flink-table-planner-blink_2.12
1.10.1

四:程序结构
1.source
2.transoform
3.sink
五:流转表
package com.atguigu.tableapiandsql

import com.atguigu.sourceandsink.SensorReading
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.Table
import org.apache.flink.table.api.scala._

object TestExample {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//val inputStream: DataStream[String] = env.socketTextStream(“localhost”,7777)
val inputStream: DataStream[String] = env.readTextFile(“in/sensor.txt”)
//提取eventtime
val dataStream: DataStream[SensorReading] = inputStream.map(data => {
val strings: Array[String] = data.split(",")
SensorReading(strings(0), strings(1).toLong, strings(2).toDouble)
})
//调用table执行环境
val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)
//基于流创建一张表
val dataTable: Table = tableEnv.fromDataStream(dataStream)
//调用table api 进行转换
val resultTable: Table = dataTable
.select(“id,temperature”)
.filter(“id ==‘sensor_1’”)
//直接用SQL实现
//1.注册表环境
tableEnv.createTemporaryView(“dataTable”,dataTable)
//
val sql = “select id,temperature from dataTable where id = ‘sensor_1’”
val resultSqlTable: Table = tableEnv.sqlQuery(sql)
resultSqlTable.toAppendStream[(String,Double)].print(“resultSqlTable”)
resultTable.toAppendStream[(String,Double)].print(“table”)
env.execute(“test table api”)
}
}
六:一般化方法:
//1.创建环境
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//2.基于Flink流创建表执行环境 可以更换env setting
val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)
//1.1 基于老版本的palnner的流处理
val settings: EnvironmentSettings = EnvironmentSettings.newInstance()
.useOldPlanner()
.inStreamingMode()
.build()
val oldStreamTableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env,blinkStreamSettings)
//1.1.2基于老版本的批处理
val batchEnv: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val oldBatchTableEnv: BatchTableEnvironment = BatchTableEnvironment.create(batchEnv)
//1.2.1基于blink planner的流处理 批流同一
val blinkStreamSettings: EnvironmentSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build()
val blinkStreamTableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env,blinkStreamSettings)

//1.2.2基于blink pllannser的批处理
val blinkBatchTableEnv: EnvironmentSettings = EnvironmentSettings.newInstance()
  .useBlinkPlanner()
  .inBatchMode()
  .build()
    val batchBatchTableEnv: TableEnvironment = TableEnvironment.create(blinkBatchTableEnv)

表(Table)
• TableEnvironment 可以注册目录 Catalog,并可以基于 Catalog 注册表
• 表(Table)是由一个“标识符”(identifier)来指定的,由3部分组成:
Catalog名、数据库(database)名和对象名
• 表可以是常规的,也可以是虚拟的(视图,View)
• 常规表(Table)一般可以用来描述外部数据,比如文件、数据库表或消息队
列的数据,也可以直接从 DataStream转换而来
• 视图(View)可以从现有的表中创建,通常是 table API 或者 SQL 查询的一
个结果集
从文件中读取数据
package com.atguigu.tableapiandsql

import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.{DataTypes, EnvironmentSettings, Table}
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{FileSystem, OldCsv, Schema}

object TestSql {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//2.基于Flink流创建表执行环境 可以更换env setting
//val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)
//1.2.1基于blink planner的流处理 批流同一
//val batchEnv: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
// val oldBatchTableEnv: BatchTableEnvironment = BatchTableEnvironment.create(batchEnv)
val blinkStreamSettings: EnvironmentSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build()
val blinkStreamTableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env,blinkStreamSettings)
//2.连接外部系统 读取数据 注册表
blinkStreamTableEnv
.connect(new FileSystem().path(“in/sensor.txt”))
.withFormat(new OldCsv())
.withSchema(new Schema()
.field(“id”,DataTypes.STRING())
.field(“timestamp”,DataTypes.BIGINT())
.field(“temp”,DataTypes.DOUBLE())).createTemporaryTable(“inputTable”)
//
val inputTable: Table = blinkStreamTableEnv.from(“inputTable”)
inputTable.toAppendStream[(String,Long,Double)].print()
env.execute("**********")

}
}

经测试 从kafka中读取时 如果停止任务 再启动任务时 可以从上次消费的数据后面接着消费数据。
package com.atguigu.tableapiandsql

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.{DataTypes, EnvironmentSettings, Table}
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{Csv, FileSystem, Kafka, Schema}

object connectkafka {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val blinkStreamSettings: EnvironmentSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build()
val blinkStreamTableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env,blinkStreamSettings)
//2.连接外部系统 读取数据 注册表
blinkStreamTableEnv
.connect(new Kafka()
.version(“0.11”)
.topic(“sensor”)
//.property(“zookeeper.connect”,“hadoop203:2181”)
.property(“bootstrap.servers”,“hadoop203:9092”))
.withFormat(new Csv)
.withSchema(new Schema()
.field(“id”,DataTypes.STRING())
.field(“timestamp”,DataTypes.BIGINT())
.field(“temp”,DataTypes.DOUBLE())).createTemporaryTable(“kafkainputTable”)
val inputTable: Table = blinkStreamTableEnv.from(“kafkainputTable”)
//3.查询转换 基于tableAPI
val resultApiTable: Table = inputTable
.select('id,'temp)
.filter('id===“sensor_1”)
val resultSqlTable: Table = blinkStreamTableEnv.sqlQuery(
“”"
select id,temp
from kafkainputTable
where id = ‘sensor_1’
“”".stripMargin)
resultSqlTable.toAppendStream[(String,Double)].print(“resultSqlTable”)
resultApiTable.toAppendStream[(String,Double)].print(“resultApiTable”)
env.execute(“table api test”)
//
}
}
从文件写入文件 聚合操作无法写入文件
package com.atguigu.tableapiandsql

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.{DataTypes, EnvironmentSettings, Table}
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{Csv, FileSystem, Kafka, Schema}

object FileOutputTest {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val blinkStreamSettings: EnvironmentSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build()
val blinkStreamTableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env,blinkStreamSettings)
//2.连接外部系统 读取数据 注册表
blinkStreamTableEnv
.connect(new FileSystem().path(“in/sensor.txt”))
.withFormat(new Csv())
.withSchema(new Schema()
.field(“id”,DataTypes.STRING())
.field(“timestamp”,DataTypes.BIGINT())
.field(“temp”,DataTypes.DOUBLE())).createTemporaryTable(“inputTable”)
//3.转换操作 首先转换为表
val sensorTable: Table = blinkStreamTableEnv.from(“inputTable”)
val resultTable: Table = sensorTable
.select('id,'temp)
.filter('id===“sensor_1”)
//3.2聚合转换
val aggTable: Table = sensorTable.groupBy('id) //基于ID分组
.select('id, 'id.count as 'count)
//定义输出到文件
blinkStreamTableEnv
.connect(new FileSystem().path(“output”))
.withFormat(new Csv())
.withSchema(new Schema()
.field(“id”,DataTypes.STRING())
.field(“temp”,DataTypes.DOUBLE())).createTemporaryTable(“outPutTable”)
resultTable.insertInto(“outPutTable”)
aggTable.insertInto(“outPutTable”)
env.execute()

}
}
更新模式
1.追加模式 Append
2.Retract模式 删除消息时 撤回 false 两条消息更新一条消息 更新一次发两条消息
3.更新插入(Upsert)= update+insert 删除为delete消息 更新和插入是一种。key如果存在直接更新value。如果没有直接插入消息。、
从kafka读取数据经过转换写入kafka.
package com.atguigu.tableapiandsql

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.{DataTypes, EnvironmentSettings, Table}
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{Csv, Kafka, Schema}

object KafkaPipLineTest {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val blinkStreamSettings: EnvironmentSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build()
val blinkStreamTableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env,blinkStreamSettings)
//2.从kafka中读取数据sensor
blinkStreamTableEnv
.connect(new Kafka()
.version(“0.11”)
.topic(“sensor”)
//.property(“zookeeper.connect”,“hadoop203:2181”)
.property(“bootstrap.servers”,“hadoop203:9092”))
.withFormat(new Csv)
.withSchema(new Schema()
.field(“id”,DataTypes.STRING())
.field(“timestamp”,DataTypes.BIGINT())
.field(“temp”,DataTypes.DOUBLE())).createTemporaryTable(“kafkainputTable”)
val sensorTable: Table = blinkStreamTableEnv.from(“kafkainputTable”)
val resultTable: Table = sensorTable
.select('id,'temp)
.filter('id===“sensor_1”)
//3.2聚合转换
val aggTable: Table = sensorTable.groupBy('id) //基于ID分组
.select('id, 'id.count as 'count)
//输出到kafka
blinkStreamTableEnv
.connect(new Kafka()
.version(“0.11”)
.topic(“sensorout”)
//.property(“zookeeper.connect”,“hadoop203:2181”)
.property(“bootstrap.servers”,“hadoop203:9092”))
.withFormat(new Csv)
.withSchema(new Schema()
.field(“id”,DataTypes.STRING())
.field(“temp”,DataTypes.DOUBLE())).createTemporaryTable(“kafkaoutputTable”)
resultTable.insertInto(“kafkaoutputTable”)
env.execute(“kafkapiplinetest”)
}
}

输出到 MySql
• 可以创建 Table 来描述 MySql 中的数据,作为输入和输出
val sinkDDL: String =
“”"
|create table jdbcOutputTable (
| id varchar(20) not null,
| cnt bigint not null
|) with (
| ‘connector.type’ = ‘jdbc’,
| ‘connector.url’ = ‘jdbc:mysql://localhost:3306/test’,
| ‘connector.table’ = ‘sensor_count’,
| ‘connector.driver’ = ‘com.mysql.jdbc.Driver’,
| ‘connector.username’ = ‘root’,
| ‘connector.password’ = ‘123456’
|)
“”".stripMargin
tableEnv.sqlUpdate(sinkDDL) // 执行 DDL创建表
aggResultSqlTable.insertInto(“jdbcOutputTable”)
查看执行计划
blinkStreamTableEnv.explain(resultSqlTable)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值