Flink 出租车实验室 ◆ 计算出租车载客次数

流云细水

已于 2022-01-28 11:00:28 修改

阅读量1.8k

点赞数 1

分类专栏： Flink 文章标签： flink scala java

于 2022-01-27 00:10:43 首次发布

本文链接：https://blog.csdn.net/ZhangQingmu/article/details/122710642

版权

Flink 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

这是一个计算每个驾驶员乘坐次数的示例。

关键类

TaxiRide 出租车乘车事件抽象类
DataGenerator 模拟数据生成器
Tuple 数据类型
元组是包含固定数量的各种类型的字段的复合类型。Java API 提供从Tuple1最高到Tuple25. 元组的每个字段都可以是任意 Flink 类型，包括更多元组，从而产生嵌套元组。tuple.f4可以使用字段名称 as或使用通用 getter 方法直接访问元组的字段tuple.getField(int position)。字段索引从 0 开始。请注意，这与 Scala 元组相反，但它更符合 Java 的一般索引。

RideCountExample 代码


/**
 * Example that counts the rides for each driver.
 *
 * <p>Note that this is implicitly keeping state for each driver.
 * This sort of simple, non-windowed aggregation on an unbounded set of keys will use an unbounded amount of state.
 * When this is an issue, look at the SQL/Table API, or ProcessFunction, or state TTL, all of which provide
 * mechanisms for expiring state for stale keys.
 */
public class RideCountExample {

	/**
	 * Main method.
	 *
	 * @throws Exception which occurs during job execution.
	 */
	public static void main(String[] args) throws Exception {

		// set up streaming execution environment
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		// 增加源数据
		DataStream<TaxiRide> rides = env.addSource(new TaxiRideGenerator());

		// 聚合源数据成一个元素数据流
		DataStream<Tuple2<Long, Long>> tuples = rides.map(new MapFunction<TaxiRide, Tuple2<Long, Long>>() {
					@Override
					public Tuple2<Long, Long> map(TaxiRide ride) {
						return Tuple2.of(ride.driverId, 1L);
					}
		});

		// 根据driverId对数据源进行分区
		KeyedStream<Tuple2<Long, Long>, Long> keyedByDriverId = tuples.keyBy(t -> t.f0);

		// 聚合计算司机数量 sum(1) 中的参数是分区号
		DataStream<Tuple2<Long, Long>> rideCounts = keyedByDriverId.sum(1);

		// we could, in fact, print out any or all of these streams
		rideCounts.print();

		// run the cleansing pipeline
		env.execute("Ride Count");
	}
}