一、Flink DataSetUtils常用API
self
val self: DataSet[T]
Data Set
获取DataSet本身。
执行程序:
//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")
//2.获取input本身
val s=input.self
//3.比较对象引用
s==input
执行结果:
res133: Boolean = true
countElementsPerPartition
def countElementsPerPartition: DataSet[(Int, Long)]
Method that goes over all the elements in each partition in order to
retrieve the total number of elements.
获取DataSet的每个分片中元素的个数。
执行程序:
//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")
//2.设置分片前
val p0=input.getParallelism
val c0=input.countElementsPerPartition
c0.collect
//2.设置分片后
//设置并行度为3,实际上是将数据分片为3
input.setParallelism(3)
val p1=input.getParallelism
val c1=input.countElementsPerPartition
c1.collect
执行结果:
//设置分片前
p0: Int = 1
c0: Seq[(Int, Long)] = Buffer((0,6))
//设置分片后
p1: Int = 3
c1: Seq[(Int, Long)] = Buffer((0,2), (1,2), (2,2))
checksumHashCode
def checksumHashCode(): ChecksumHashCode
Convenience method to get the count (number of elements) of a DataSet
as well as the checksum (sum over element hashes).
获取DataSet的hashcode和元素的个数
执行程序:
//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")
//2.获取DataSet的hashcode和元素的个数
input.checksumHashCode
执行结果:
res140: org.apache.flink.api.java.Utils.ChecksumHashCode =
ChecksumHashCode 0x0000000000000195, count 6
zipWithIndex
defzipWithIndex: DataSet[(Long, T)]
Method that takes a set of subtask index, total number of elements mappings
and assigns ids to all the elements from the input data set.
元素和元素的下标进行zip操作。
执行程序:
//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")
//2.元素和元素的下标进行zip操作。
val result: DataSet[(Long, String)] = input.zipWithIndex
//3.显示结果
result.collect
执行结果:
res134: Seq[(Long, String)] = Buffer((0,A), (1,B), (2,C), (3,D), (4,E), (5,F))
flink web ui中的执行效果: