Flink学习:Flink Table/Sql API

本文介绍了ApacheFlink中TableAPI和SQL的使用,包括如何获取TableEnvironment,注册内部与外部Catalog,以及DataStream和DataSet转换为Table的方法。此外,还详细讲解了通过SqlAPI和TableAPI执行查询和数据处理的操作。
摘要由CSDN通过智能技术生成

  • 和DataStream API一样,Table API和Sql中具有相同的编程模型,首先需要构建对应的TableEnvironment创建关系型编程环境,才能够在程序中使用Table API和Sql来编写应用程序,另外Table API和Sql接口可以在应用中同时使用

一、获取TableEnvironment

1、流式应用

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.TableEnvironment

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
  }
}

2、批式应用

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.TableEnvironment

object SqlTest {
  def main(args: Array[String]): Unit = {
    val Env = ExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
  }
}
  • 首先需要创建执行环境TableEnvironment,方法中的参数分别是每种应用类型(流式应用和批式应用)对应的执行环境

二、注册Catalog

(一)、内部Catalog注册

1、内部Table注册

  • 通过scan方法从内部Catalog找到表名并select一些字段形成一个新表
  • 通过registerTable把新表注册到Catalog中
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.TableEnvironment

object SqlTest {
  def main(args: Array[String]): Unit = {
   
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
	
	val projTable = tableEnv.scan("SensorsTable").select(...)
	tableEnv.registerTable("projectedTable",projTable)
  }
}

2、TableSource注册

  • 在使用Table API时,可以将外部的数据源直接注册成Table数据结构,目前Flink已经提供了大部分常用的数据源,例如本地、HDFS、S3等文件系统,文件格式类型能够支持例如Text、CSV、Parquet等文件格式

eg:CsvTableSource

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sinks.CsvTableSink
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)

    val fieldNames:Array[String] = Array("id","name")
    val fieldTypes:Array[TypeInformation[_]] = Array(Types.INT,Types.STRING)
    val csvSource = new CsvTableSource("D:\\data\\csvfile",fieldNames,fieldTypes)

    tEnv.registerTableSource("CsvSourceTable",csvSource)
    val result = tEnv.sqlQuery("select * from CsvSourceTable where id = 192")
    result.toRetractStream[Row].print()
    streamEnv.execute()
  }
}

3、TableSink注册

  • 数据处理完成后需要将结果写入外部存储中,在Table API中有对应的Sink模块,被称为Table Sink,Table Sink操作在TableEnvironment中注册需要输出的表,Sql查询处理之后的结果将插入TableSink对应的表中,最终达到数据输出到外部系统的目的

eg:CsvTableSink

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sinks.CsvTableSink
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)

    //val stream = streamEnv.fromElements(new Tuple2(192,"nie"), new Tuple2(200,"hu"))

    val csvSink = new CsvTableSink("D:\\data\\csvfile",",")
    val fieldNames:Array[String] = Array("id","name")
    val fieldTypes:Array[TypeInformation[_]] = Array(Types.INT,Types.STRING)

    tEnv.registerTableSink("csvSinkTable", fieldNames, fieldTypes, csvSink)

    tEnv.sqlUpdate("insert into csvSinkTable values (192,'nie')")
    streamEnv.execute()
  }
}

在这里插入图片描述

(二)、外部Catalog

除了能够使用Flink内部的Catalog作为所有Table数据的元数据存储介质之外,也可以使用外部Catalog,外部Catalog需要用户自定义实现,如下述代码所示,需要实现InMemoryExternalCatalog接口

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.catalog.InMemoryExternalCatalog

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)

    val InmemCatalog = new InMemoryExternalCatalog()
    tEnv.registerExternalCatalog("externalCatalog",InmemCatalog)
  }
}

(三)、DataStream/DataSet转换为Table

Table/Sql API是构建在DataStream、DataSet API之上的更高级的抽象,可以将DataStream、DataSet转换为Table
1、DataStream注册成Table(registerDatastream)

import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.java.tuple._
import org.apache.flink.table.api.scala.table2TableConversions
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row

object sqlTest {
  def main(args: Array[String]): Unit = {
    val sEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(sEnv)

    val stream = sEnv.fromElements(new Tuple2(192,"nie"),
      new Tuple2(200,"hu"))

    //将DataStream注册成Table,指定表名为testTable
    tEnv.registerDataStream("testTable", stream)
    //将DataStream注册成Table,指定表名为testTable,并指定字段名称id,name
    tEnv.registerDataStream("testTable", stream, 'id, 'name)
  }
}

2、DataStream转换成Table(fromDataStream)

import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.java.tuple._
import org.apache.flink.table.api.scala.table2TableConversions
import org.apache.flink.table.api.{TableEnvironment, Types}
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.types.Row

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
    
    val stream = sEnv.fromElements(new Tuple2(192,"nie"),
      new Tuple2(200,"hu"))
    
    //将DataStream转换为Table
    val table1:Table = tEnv.fromDataStream(stream)
    //将DataStream转换为Table,并指定字段名称id,name
    val table1:Table = tEnv.fromDataStream(stream,'id, 'name)
    tEnv.registerTable("testTable",table1)
  }
}

三、执行

(一)、Sql API

Flink Sql可以借助于TableEnvironment的SqlQuery和SqlUpdate两种操作符使用

  • sqlQuery:从执行的Table中查询并处理数据形成新的Table
  • sqlUpdate:通过sql语句将查询的结果写入到注册的表中
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.types.Row

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)
    
    val stream = streamEnv.fromElements(new Tuple2(192,"nie"),
      new Tuple2(200,"hu"))

    tEnv.registerDataStream("testTable", stream, 'id, 'name)
    val result = tEnv.sqlQuery("select * from testTable where id = 192")
    //转换为流在进行打印出来
    result.toRetractStream[Row].print()

    streamEnv.execute("test")
    //tEnv.sqlUpdate("insert into csv_output_table select product,amount from sensors where type = 'temperature")
  }
}

(二)、Table API

  • 最开始就说过,Table/Sql API具有相同的编程模型,只需创建TableEnvironment,并注册Calalog后,用sql语句或是类似下面的TableAPI进行操作的,个人觉得Table API有点鸡肋,可读性和方便都不如Sql,用DataStream和Sql进行日常的运算即可
import org.apache.flink.api.scala.createTypeInformation
import org.apache.flink.api.java.tuple._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.types.Row

object SqlTest {
  def main(args: Array[String]): Unit = {
    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = TableEnvironment.getTableEnvironment(streamEnv)

    val stream = streamEnv.fromElements(new Tuple2(192,"nie"),
      new Tuple2(200,"hu"))

    tEnv.registerDataStream("testTable", stream, 'id, 'name)
    val result = tEnv.scan("testTable").select('*).where('id === 192)
    //转换为流在进行打印出来
    result.toRetractStream[Row].print()

    streamEnv.execute("test")
    //tEnv.sqlUpdate("insert into csv_output_table select product,amount from sensors where type = 'temperature")
  }
}

  • scan:用于查询已经在Catalog中注册的表
  • select:查询指定字段名
  • where/filter:过滤字段和检索条件,注意,在Table API语法中进行相等判断用三个等号
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

我爱夜来香A

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值