了解了flink table 简单概念,在开发中实际操作起来,看一下代码的具体实践。
Flink 1.9中保留了5个TableEnvironment,在实现上是5个面向用户的接口,在接口底层进行了不同的实现。5个接口包括一个TableEnvironment接口,两个BatchTableEnvironment接口,两个StreamTableEnvironment接口,5个接口文件完整路径如下:
org/apache/flink/table/api/TableEnvironment.java
org/apache/flink/table/api/java/BatchTableEnvironment.java
org/apache/flink/table/api/scala/BatchTableEnvironment.scala
org/apache/flink/table/api/java/StreamTableEnvironment.java
org/apache/flink/table/api/scala/StreamTableEnvironment.scala
TableEnvironment是顶级接口,是所有TableEnvironment的基类 ,BatchTableEnvironment和StreamTableEnvironment都提供了Java实现和Scala实现 ,分别有两个接口。
一、BatchTableEnvironment用于批处理场景,批处理的对象分别是 Java 的 DataSet 和 Scala 的 DataSet,BatchTableEnvironment 提供了 DataSet 和 Table 之间相互转换的接口
二、本实例主要是计算学生的总得分数,采用原始cvs文本作为source输入
三、涉及的cvs文本信息如下:
学生姓名 | 性别 | 学科 | 分数 |
张三 | 男 | 语文 | 90.5 |
张三 | 男 | 数学 | 100 |
张三 | 男 | 外语 | 80 |
李四 | 女 | 语文 | 68 |
王二 | 女 | 外语 | 99 |
四、依赖的实体类信息如下:
@Data
public class StudentInfo {
private String name;
private String sex;
private String course;
private Float score;
private Long timestamp;
}
@Data
public class StudentScoreResult {
public String name;
public float sum_total_score;
public StudentScoreResult() {}
}
五、具体flink代码实现信息如下:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
final BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);
//source,这里读取CSV文件,并转换为对应的Class
DataSet<StudentInfo> studentCsvInput = env
.readCsvFile("/Users/springk/Documents/student.csv")
.ignoreFirstLine().pojoType(StudentInfo.class,"name","sex","course","score");
//将DataSet转换为Table
Table studentInfo = tableEnv.fromDataSet(studentCsvInput);
//将studentInfo注册为一个表
tableEnv.registerTable("studentInfo",studentInfo);
//查询每个学生所有学科总分数
Table studentTable = tableEnv.sqlQuery("select name,sum(score) as sum_total_score from studentInfo group by name order by 2 desc");
//table和dataset的转换
DataSet<StudentScoreResult> result = tableEnv.toDataSet(studentTable, StudentScoreResult.class);
//将dataset map成tuple输出
result.map(new MapFunction<StudentScoreResult, Tuple2<String, Float>>() {
@Override
public Tuple2<String, Float> map(StudentScoreResult result){
String country = result.name;
float sum_total_score = result.sum_total_score;
return Tuple2.of(country,sum_total_score);
}
}).print();
//sink数据输出
String[] fieldNames={"name","sum_total_score"};
TypeInformation[] fieldTypes = {Types.STRING, Types.FLOAT};
// 默认为8个文件,每个随机写入,文件只能写入一次,再次执行会报错:
// Caused by: java.nio.file.FileAlreadyExistsException: File already exists: /Users/wangjieying/Documents/tt-sink.csv/4
// TableSink tableSink = new CsvTableSink("/Users/wangjieying/Documents/student-sink.csv","-");
TableSink tableSink = new CsvTableSink("/Users/springk/Documents/student-sink.csv"," ",1, FileSystem.WriteMode.OVERWRITE);
tableEnv.registerTableSink("studentScoreRank", fieldNames, fieldTypes, tableSink);
studentTable.insertInto("studentScoreRank");
env.execute("studentScoreAnalyse");