在实际应用中,我们往往还会关注,到底有多少不同的用户访问了网站,所以另外一个统计流量的重要指标是网站的独立访客数(Unique Visitor,UV)
1.假设我们已经采集到数据UserBehavior,并将数据放在工程目录input下面,截图如下:
文件格式如下:
自己可以随便写一个csv文件就行,用excel打开就是每个填充一个,用notepad++打开就是以,分开
2.创建bean:
package com.mischen.it.entity;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
/**
* @ClassName UserBehavior
* @Description DOTO
* @Author mischen
* @Date 2021/6/30 0030 7:50
* @Version 1.0
**/
@Data
@NoArgsConstructor
@AllArgsConstructor
public class UserBehavior {
private Long userId;
private Long itemId;
private Integer categoryId;
private String behavior;
private Long timestamp;
}
3.书写main方法,代码如下:
package com.mischen.it;
import com.mischen.it.entity.UserBehavior;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;
import java.util.HashSet;
/**
* @ClassName Flink02_Project_UV
* @Description DOTO
* @Author mischen
* @Date 2021/6/30 0030 8:23
* @Version 1.0
**/
public class Flink02_Project_UV {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env
.readTextFile("input/UserBehavior.csv")
.flatMap((String line, Collector<Tuple2<String, Long>> out) -> {
String[] split = line.split(",");
UserBehavior behavior = new UserBehavior(
Long.valueOf(split[0]),
Long.valueOf(split[1]),
Integer.valueOf(split[2]),
split[3],
Long.valueOf(split[4]));
if ("pv".equals(behavior.getBehavior())) {
out.collect(Tuple2.of("uv", behavior.getUserId()));
}
}).returns(Types.TUPLE(Types.STRING, Types.LONG))
.keyBy(t -> t.f0)
.process(new KeyedProcessFunction<String, Tuple2<String, Long>, Integer>() {
HashSet<Long> userIds = new HashSet<>();
@Override
public void processElement(Tuple2<String, Long> value, Context ctx, Collector<Integer> out) throws Exception {
userIds.add(value.f1);
out.collect(userIds.size());
}
})
.print("uv");
env.execute();
}
}
4.运行结果如下: