简要对比
spark structured stream Structured Streaming Programming Guide - Spark 3.5.1 Documentation |
flink Apache Flink Documentation | Apache Flink |
||
---|---|---|---|
source | file source | API: readStream.format("csv")... | flink SQL |
source | kafka source | API: readStreammWriter.format("kafka")... | flink SQL |
source | redis sourcek | API: readStream.format("redis")... | 没有基于stream;可以做batch/dim表https://github.com/jeff-zou/flink-connector-redis |
source | jdbc source | N/A | flink SQL CDC |
spark structured stream | flink | ||
---|---|---|---|
sink | file | sdf.writeStream.format(...) append |
append |
sink | kafka kafka upsert |
sdf.writeStream.format(...) Append, Update, Complete (at-least-once) 相当于没有主键的K,V表; 所有mode,都是insert into |
基于SQL语义自动判断: 1. 当simple source to sink ETL, append mode, 可以写KAFKA 2. 当agg, 有update语义,可以写upsert-kafka |
sink | redis sink | 支持A,U,C; 但需要通过foreach自定义实现(间接调用普通的df.write) source:基于redis v5+, stream api (xadd,xread) sink: foreachBatch,基于hset hget |
没有基于stream;可以做batch/dim表h |