目录
3、操作:选择(Selection)、投射(Projection)和聚合(Aggregation)
1、创建
val sqLContext = SparkSession.builder().appName(" event-time-window_App").getOrCreate()
2、输入源
Source | Options | Fault-tolerant | Notes |
---|---|---|---|
File source | path : path to the input directory, and common to all file formats.maxFilesPerTrigger : maximum number of new files to be considered in every trigger (default: no max)latestFirst : whether to process the latest new files first, useful when there is a large backlog of files (default: false)fileNameOnly : whether to check new files based on only the filename instead of on the full path (default: false). With this set to `true`, the following files would be considered as the same file, because their filenames, "dataset.txt", are the same:"file:///dataset.txt" "s3://a/dataset.txt" "s3n://a/b/dataset.txt" "s3a://a/b/c/dataset.txt" For file-format-specific options, see the related methods in DataStreamReader (Scala/Java/Python/R). E.g. for "parquet" format options see DataStreamReader.parquet() .In addition, there are session configurations that affect certain file-formats. See the SQL Programming Guide for more details. E.g., for "parquet", see Parquet configuration section. |
Yes | Supports glob paths, but does not support multiple comma-separated paths/globs. |
Socket Source | host : host to connect to, must be specifiedport : port to connect to, must be specified |
No | |
Rate Source | rowsPerSecond (e.g. 100, default: 1): How many rows should be generated per second. |