针对flink table api的相关信息,针对主要一些方法,实践编程主要代码如下:
引用的基础信息:
import lombok.Data;
@Data
public class StudentInfo{
private String name;
private String sex;
private String course;
private Float score;
private Long timestamp;
}
UTC2Local类是为了解决flink时差问题:
import org.apache.flink.table.functions.ScalarFunction;
import java.sql.Timestamp;
public class UTC2Local extends ScalarFunction {
public Timestamp eval(Long s) {
return new Timestamp(s); //转换成本地对应时间
}
public long eval2(Long s) {
long timestamp = s + 28800000; //flink默认的是UTC时间,我们的时区是东八区,时间戳需要增加八个小时
return timestamp;
}
}
一、BatchTableEnvironment具体代码实现类:
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.Tumble;
import org.apache.flink.table.api.java.BatchTableEnvironment;
import org.apache.flink.types.Row;
public class FlinkTableApiBatchExample {
public static void main(String[] args) throws Exception{
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
//source,这里读取CSV文件,并转换为对应的Class
DataSet<StudentInfo> studentCsvInput = env
.readCsvFile("/Users/springk/Documents/student.csv")
.ignoreFirstLine().pojoType(StudentInfo.class,"name","sex","course","score","timestamp");
//将DataSet转换为Table
Table studentInfo =tEnv.fromDataSet(studentCsvInput);
//将studentInfo注册为一个表
tEnv.registerTable("studentInfo",studentInfo);
// 1、GroupBy Aggregation 根据name分组,统计学科数量
Table counts = tEnv.scan("studentInfo")
.groupBy("name")
.select("name, course.count as cnt");
DataSet<Row> result = tEnv.toDataSet(counts, Row.class);
result.print();
tEnv.registerFunction("utc2local",new UTC2Local());
// 2、GroupBy Window
Table resultGroupByWindow = studentInfo
.filter("name.isNotNull && course.isNotNull ")
.select("name.lowerCase() as name, course, utc2local(timestamp) as timestamp")
.window(Tumble.over("1.hour").on("timestamp").as("hourlyWindow"))
.groupBy("hourlyWindow, name,course")
.select("name, hourlyWindow.end, hourlyWindow.start,hourlyWindow.rowtime as hour, course, course.count as courseCount");
DataSet<Row> result2 = tEnv.toDataSet(resultGroupByWindow, Row.class);
result2.print();
// 3、distinct
Table groupByDistinctResult = studentInfo
.groupBy("name")
.select("name, score.sum.distinct as d");
DataSet<Row> result3 = tEnv.toDataSet(groupByDistinctResult, Row.class);
result3.print();
}
}
代码运行结果如下:
结果一:
结果二:
结果三:
flink的时间及时区问题解决
Flink默认时间是UTC时间,和我们的GMT+8时区差八个小时,需要统一对应的时间,方法如下:
1、修改脚本jobmanager和taskmanager当中的FLINK_ENV_JAVA_OPTS参数:
- a、修改$FLINK_HOME/1.10.0/libexec/libexec/jobmanager.sh中JAVA OPTS环境变量:
export FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_JM}"
变更为:export FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_JM} -Duser.timezone=GMT+08" - b、修改$FLINK_HOME/1.10.0/libexec/libexec/taskmanager.sh中JAVA OPTS环境变量:
export FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_TM}"
变更为:export FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_TM} -Duser.timezone=GMT+08"
2、使用Udf方式实现,上面代码中就使用了该方式,定义了UTC2Local类,继承ScalarFunction,并且在BatchTableEnvironment中注册:tEnv.registerFunction(“utc2local”,new UTC2Local()),后面可以直接使用改function
3、也可以在source数据获取的时候,java时间转换,直接在时间戳上面增加八个小时