五、Structured Streaming的流式DataFrames/Datasets的操作

最新推荐文章于 2023-12-01 09:51:16 发布

嘻哈吼嘿呵

最新推荐文章于 2023-12-01 09:51:16 发布

阅读量656

点赞数

分类专栏： # Structured Streaming

本文链接：https://blog.csdn.net/s294878304/article/details/100636079

版权

本文详细介绍了Structured Streaming中DataFrame/Datasets的创建、输入源、选择、投影、聚合、Join操作（包括Stream-static Joins和Stream-stream Joins），以及不支持的操作。重点讨论了Stream-Stream Joins的内连接、外连接，watermark机制和全局水印策略，同时也指出了在Structured Streaming中不支持的一些DataFrame/Dataset操作。

摘要由CSDN通过智能技术生成

1、创建

2、输入源

3、操作：选择（Selection）、投射（Projection）和聚合（Aggregation）

4、Join操作

（1）Stream-static Joins

（2）Stream-steam Joins

6、不支持的操作

1、创建

    val sqLContext = SparkSession.builder().appName(" event-time-window_App").getOrCreate()

2、输入源

Source	Options	Fault-tolerant	Notes
File source	`path`: path to the input directory, and common to all file formats. `maxFilesPerTrigger`: maximum number of new files to be considered in every trigger (default: no max) `latestFirst`: whether to process the latest new files first, useful when there is a large backlog of files (default: false) `fileNameOnly`: whether to check new files based on only the filename instead of on the full path (default: false). With this set to `true`, the following files would be considered as the same file, because their filenames, "dataset.txt", are the same: "file:///dataset.txt" "s3://a/dataset.txt" "s3n://a/b/dataset.txt" "s3a://a/b/c/dataset.txt" For file-format-specific options, see the related methods in `DataStreamReader` (Scala/Java/Python/R). E.g. for "parquet" format options see `DataStreamReader.parquet()`. In addition, there are session configurations that affect certain file-formats. See the SQL Programming Guide for more details. E.g., for "parquet", see Parquet configuration section.	Yes	Supports glob paths, but does not support multiple comma-separated paths/globs.
Socket Source	`host`: host to connect to, must be specified `port`: port to connect to, must be specified	No
Rate Source	`rowsPerSecond` (e.g. 100, default: 1): How many rows should be generated per second.