Flink Table / Sql
- 和DataStream API一样,Table API和Sql中具有相同的编程模型,首先需要构建对应的TableEnvironment创建关系型编程环境,才能够在程序中使用Table API和Sql来编写应用程序,另外Table API和Sql接口可以在应用中同时使用
一、获取TableEnvironment
1、流式应用
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.TableEnvironment
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
}
}
2、批式应用
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.TableEnvironment
object SqlTest {
def main(args: Array[String]): Unit = {
val Env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
}
}
- 首先需要创建执行环境TableEnvironment,方法中的参数分别是每种应用类型(流式应用和批式应用)对应的执行环境
二、注册Catalog
(一)、内部Catalog注册
1、内部Table注册
- 通过scan方法从内部Catalog找到表名并select一些字段形成一个新表
- 通过registerTable把新表注册到Catalog中
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.TableEnvironment
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
val projTable = tableEnv.scan("SensorsTable").select(...)
tableEnv.registerTable("projectedTable",projTable)
}
}
2、TableSource注册
- 在使用Table API时,可以将外部的数据源直接注册成Table数据结构,目前Flink已经提供了大部分常用的数据源,例如本地、HDFS、S3等文件系统,文件格式类型能够支持例如Text、CSV、Parquet等文件格式
eg:CsvTableSource
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sinks.CsvTableSink
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
val fieldNames:Array[String] = Array("id","name")
val fieldTypes:Array[TypeInformation[_]] = Array(Types.INT,Types.STRING)
val csvSource = new CsvTableSource("D:\\data\\csvfile",fieldNames,fieldTypes)
tEnv.registerTableSource("CsvSourceTable",csvSource)
val result = tEnv.sqlQuery("select * from CsvSourceTable where id = 192")
result.toRetractStream[Row].print()
streamEnv.execute()
}
}
3、TableSink注册
- 数据处理完成后需要将结果写入外部存储中,在Table API中有对应的Sink模块,被称为Table Sink,Table Sink操作在TableEnvironment中注册需要输出的表,Sql查询处理之后的结果将插入TableSink对应的表中,最终达到数据输出到外部系统的目的
eg:CsvTableSink
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sinks.CsvTableSink
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
//val stream = streamEnv.fromElements(new Tuple2(192,"nie"), new Tuple2(200,"hu"))
val csvSink = new CsvTableSink("D:\\data\\csvfile",",")
val fieldNames:Array[String] = Array("id","name")
val fieldTypes:Array[TypeInformation[_]] = Array(Types.INT,Types.STRING)
tEnv.registerTableSink("csvSinkTable", fieldNames, fieldTypes, csvSink)
tEnv.sqlUpdate("insert into csvSinkTable values (192,'nie')")
streamEnv.execute()
}
}
(二)、外部Catalog
除了能够使用Flink内部的Catalog作为所有Table数据的元数据存储介质之外,也可以使用外部Catalog,外部Catalog需要用户自定义实现,如下述代码所示,需要实现InMemoryExternalCatalog接口
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.catalog.InMemoryExternalCatalog
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
val InmemCatalog = new InMemoryExternalCatalog()
tEnv.registerExternalCatalog("externalCatalog",InmemCatalog)
}
}
(三)、DataStream/DataSet转换为Table
Table/Sql API是构建在DataStream、DataSet API之上的更高级的抽象,可以将DataStream、DataSet转换为Table
1、DataStream注册成Table(registerDatastream)
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.java.tuple._
import org.apache.flink.table.api.scala.table2TableConversions
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row
object sqlTest {
def main(args: Array[String]): Unit = {
val sEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(sEnv)
val stream = sEnv.fromElements(new Tuple2(192,"nie"),
new Tuple2(200,"hu"))
//将DataStream注册成Table,指定表名为testTable
tEnv.registerDataStream("testTable", stream)
//将DataStream注册成Table,指定表名为testTable,并指定字段名称id,name
tEnv.registerDataStream("testTable", stream, 'id, 'name)
}
}
2、DataStream转换成Table(fromDataStream)
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.java.tuple._
import org.apache.flink.table.api.scala.table2TableConversions
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
val stream = sEnv.fromElements(new Tuple2(192,"nie"),
new Tuple2(200,"hu"))
//将DataStream转换为Table
val table1:Table = tEnv.fromDataStream(stream)
//将DataStream转换为Table,并指定字段名称id,name
val table1:Table = tEnv.fromDataStream(stream,'id, 'name)
tEnv.registerTable("testTable",table1)
}
}
三、执行
(一)、Sql API
Flink Sql可以借助于TableEnvironment的SqlQuery和SqlUpdate两种操作符使用
- sqlQuery:从执行的Table中查询并处理数据形成新的Table
- sqlUpdate:通过sql语句将查询的结果写入到注册的表中
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.types.Row
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
val stream = streamEnv.fromElements(new Tuple2(192,"nie"),
new Tuple2(200,"hu"))
tEnv.registerDataStream("testTable", stream, 'id, 'name)
val result = tEnv.sqlQuery("select * from testTable where id = 192")
//转换为流在进行打印出来
result.toRetractStream[Row].print()
streamEnv.execute("test")
//tEnv.sqlUpdate("insert into csv_output_table select product,amount from sensors where type = 'temperature")
}
}
(二)、Table API
- 最开始就说过,Table/Sql API具有相同的编程模型,只需创建TableEnvironment,并注册Calalog后,用sql语句或是类似下面的TableAPI进行操作的,个人觉得Table API有点鸡肋,可读性和方便都不如Sql,用DataStream和Sql进行日常的运算即可
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.types.Row
object SqlTest {
def main(args: Array[String]): Unit = {
val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
val stream = streamEnv.fromElements(new Tuple2(192,"nie"),
new Tuple2(200,"hu"))
tEnv.registerDataStream("testTable", stream, 'id, 'name)
val result = tEnv.scan("testTable").select('*).where('id === 192)
//转换为流在进行打印出来
result.toRetractStream[Row].print()
streamEnv.execute("test")
//tEnv.sqlUpdate("insert into csv_output_table select product,amount from sensors where type = 'temperature")
}
}
- scan:用于查询已经在Catalog中注册的表
- select:查询指定字段名
- where/filter:过滤字段和检索条件,注意,在Table API语法中进行相等判断用三个等号