场景: 同一时段大量用户涌入一个入口, 用常规的Set集合去重会造成OOM
在Flink项目当中使用boolm过滤器进行UV的去重
1. 算子进行计算 .process(new processFunWithBoolm())
2.实现一个布隆过滤器
class MyBloom(size: Long) extends Serializable { // 1<<27 1左移27 =2的27次方 = 134217728 private val cap = if (size > 0) size else 1 << 27 def hash(value: String, seed: Int): Long = { var result = 0L for (i <- 0 until value.length) { result = result * seed + value.charAt(i) } result & (cap - 1) } }
3.调用processFunWithBoolm函数
class processFunWithBoolm() extends ProcessWindowFunction[(String, Long), UVcount, String, TimeWindow] { lazy val jedis = new Jedis("hadoop103", 6379) lazy private val bloom = new MyBloom(1 << 29) //64M大小的位图 override def process(key: String, context: Context, elements: Iterable[(String, Long)], out: Collector[UVcount])