DataSource
-
基于集合
fromCollection(Collection)
-
基于文件
readTextFile(path)
Transformation
-
Map
-
FlatMap
-
MapPartition: 一次处理一个分区的数据
-
Filter
-
Reduce
-
Aggregations
-
Distinct: 返回数据集中去重后的元素
-
Join
-
OuterJoin
-
Cross
-
Union
-
First-n:获取集合中前n个元素
-
Sort Partition:对所有分区排序
-
Rebalance:
-
Hash-Partition:根据指定key的散列值对数据集分区
partitionByHash()
-
Range-Partition:根据指定key对数据集进行范围分区
.partitionByRange
-
Custom Partition
partitionCustom(partitioner, "someKey")
partitionCustom(partitioner, 0)
Sink
- writeAsText()
- writeAsCsv()
- print()