数据准备:
DataStream<Row> dataStream =
env.fromElements(
Row.ofKind(RowKind.INSERT, "Alice", 12, 0),
Row.ofKind(RowKind.INSERT, "Bob", 5, 0),
Row.ofKind(RowKind.INSERT, "litz", 15, 0),
Row.ofKind(RowKind.UPDATE_BEFORE, "litz", 15, 0),
Row.ofKind(RowKind.UPDATE_AFTER, "litz", 666, 1),
Row.ofKind(RowKind.UPDATE_BEFORE, "Alice", 12, 0),
Row.ofKind(RowKind.UPDATE_AFTER, "Alice", 110, 1),
Row.ofKind(RowKind.UPDATE_BEFORE, "Alice", 110, 1),
Row.ofKind(RowKind.UPDATE_AFTER, "Alice", 100, 2))
.returns(
Types.ROW_NAMED(
new String[]{"name", "age", "status"},
Types.STRING, Types.INT, Types.INT));
// interpret the DataStream as a Table
Schema schema = Schema.newBuilder()
.column("name", DataTypes.STRING())
.column("age", DataTypes.INT())
.column("status", DataTypes.INT())
.build();
Table table = tableEnv.fromChangelogStream(dataStream, schema);
DataStream<Row> changelogStream = tableEnv.toChangelogStream(table);
丢弃-u数据
DataStream<Row> filtered = changelogStream.filter(row -> {
RowKind kind = row.getKind();
Integer status = (Integer) row.getField("status");
if (2 == status && kind == RowKind.UPDATE_AFTER) {
return false;
}
return true;
});
流转表
Table tableFiltered = tableEnv.fromChangelogStream(filtered);
tableEnv.createTemporaryView("InputTable", tableFiltered);
此案例可以实现如果状态流转结束,实现数据删除,从而可以降低数据量。
参考
[1] https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/data_stream_api/