1. 准备一个员工实体类
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Emp {
private Integer empno;
private String ename;
private Integer deptno;
private Long ts;
}
2.准备部门实体类
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Dept {
private Integer deptno;
private String dname;
private Long ts;
}
3. 需求: 源数据从指定的端口中获取, 分别创建员工流数据和部门流数据, 并指定水位线以及提取事件时间字段(流处理环境和并行度略过,该案例的并行度为方便测试设置为1)
//TODO 3.从指定的网络端口中获取员工数据并指定Watermark以及提取事件时间字段
//100,zs,10,8
SingleOutputStreamOperator<Emp> empDS = env
.socketTextStream("hadoop102", 8888)
.map(
empStr -> {
String[] fieldArr = empStr.split(",");
return new Emp(Integer.valueOf(fieldArr[0]), fieldArr[1], Integer.valueOf(fieldArr[2]), Long.valueOf(fieldArr[3]));
}
)
.assignTimestampsAndWatermarks(
// WatermarkStrategy.<Emp>forBoundedOutOfOrderness(Duration.ofSeconds(3))
WatermarkStrategy.<Emp>forMonotonousTimestamps()
.withTimestampAssigner(
new SerializableTimestampAssigner<Emp>() {
@Override
public long extractTimestamp(Emp emp, long recordTimestamp) {
return emp.getTs();
}
}
)
);
empDS.print("emp:");
//TODO 4.从指定的网络端口中获取部门数据并指定Watermark以及提取事件时间字段
SingleOutputStreamOperator<Dept> deptDS = env
.socketTextStream("hadoop102", 8889)
.map(
deptStr -> {
String[] fieldArr = deptStr.split(",");
return new Dept(Integer.valueOf(fieldArr[0]), fieldArr[1], Long.valueOf(fieldArr[2]));
}
)
.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Dept>forMonotonousTimestamps()
.withTimestampAssigner(
new SerializableTimestampAssigner<Dept>() {
@Override
public long extractTimestamp(Dept dept, long recordTimestamp) {
return dept.getTs();
}
}
)
);
deptDS.print("dept:");
4. 将员工流与部门流采用API方式进行关联--IntervalJoin
~~~ 采用IntervalJoin进行关联,IntervalJoin的前提是两条流的数据都需要先进行keyby分组,这里是按照部门编号分组的
~~~ 关于IntervalJoin时是哪条流join哪条流,这两条流的先后顺序没有区别,也不会发生数据重复, 因为调用IntervalJoin在底层底层实现的双向调用
~~~ 两条流关联时应该考虑关联另一条流数据的时间范围,也就是下届和上届, 可以调用between方法传入两个参数,分别代表下届和上届
~~~ 关联并且设置好关联范围之后, 对连接好的流进行业务操作,比如调用process(我这里没有过多处理),仅仅是使用Tuple格式进行往下游传递
//TODO 5.使用intervalJoin对员工和部门进行关联
SingleOutputStreamOperator<Tuple2<Emp, Dept>> joinedDS = empDS
.keyBy(Emp::getDeptno)
.intervalJoin(deptDS.keyBy(Dept::getDeptno))
.between(Time.milliseconds(-5), Time.milliseconds(5))
.process(
new ProcessJoinFunction<Emp, Dept, Tuple2<Emp, Dept>>() {
@Override
public void processElement(Emp emp, Dept dept, Context ctx, Collector<Tuple2<Emp, Dept>> out) throws Exception {
out.collect(Tuple2.of(emp, dept));
}
}
);
joinedDS.print(">>>>");