【云星数据---Apache Flink实战系列(精品版)】：Apache Flink批处理API详解与编程实战017--DateSet实用API详解017

最新推荐文章于 2019-08-04 22:52:45 发布

李国华技术博客

最新推荐文章于 2019-08-04 22:52:45 发布

阅读量1.3w

点赞数

分类专栏： cloudcomputing bigdata flink 文章标签： apache api string c语言批处理

本文链接：https://blog.csdn.net/liguohuaBigdata/article/details/78557963

版权

bigdata 同时被 3 个专栏收录

187 篇文章 2 订阅

订阅专栏

cloudcomputing

183 篇文章 0 订阅

订阅专栏

flink

86 篇文章 57 订阅

订阅专栏

一、Flink DataSetUtils常用API

self

val self: DataSet[T]

Data Set

获取DataSet本身。

执行程序：

//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")

//2.获取input本身
val s=input.self

//3.比较对象引用
s==input

执行结果：

res133: Boolean = true

countElementsPerPartition

def countElementsPerPartition: DataSet[(Int, Long)]

Method that goes over all the elements in each partition in order to 
retrieve the total number of elements.

获取DataSet的每个分片中元素的个数。

执行程序：

//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")

//2.设置分片前
val p0=input.getParallelism
val c0=input.countElementsPerPartition
c0.collect

//2.设置分片后
//设置并行度为3，实际上是将数据分片为3
input.setParallelism(3)
val p1=input.getParallelism
val c1=input.countElementsPerPartition
c1.collect

执行结果：

//设置分片前
p0: Int = 1
c0: Seq[(Int, Long)] = Buffer((0,6))

//设置分片后
p1: Int = 3
c1: Seq[(Int, Long)] = Buffer((0,2), (1,2), (2,2))

checksumHashCode

def checksumHashCode(): ChecksumHashCode

Convenience method to get the count (number of elements) of a DataSet
as well as the checksum (sum over element hashes).

获取DataSet的hashcode和元素的个数

执行程序：

//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")

//2.获取DataSet的hashcode和元素的个数
input.checksumHashCode

执行结果：

res140: org.apache.flink.api.java.Utils.ChecksumHashCode = 
ChecksumHashCode 0x0000000000000195, count 6

zipWithIndex

defzipWithIndex: DataSet[(Long, T)]

Method that takes a set of subtask index, total number of elements mappings
and assigns ids to all the elements from the input data set.

元素和元素的下标进行zip操作。

执行程序：

//1.创建一个 DataSet其元素为String类型
val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")

//2.元素和元素的下标进行zip操作。
val result: DataSet[(Long, String)] = input.zipWithIndex

//3.显示结果
result.collect

执行结果：

res134: Seq[(Long, String)] = Buffer((0,A), (1,B), (2,C), (3,D), (4,E), (5,F))

flink web ui中的执行效果：

这里写图片描述

李国华技术博客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【云星数据---Apache Flink实战系列(精品版)】：Apache Flink批处理API详解与编程实战017--DateSet实用API详解017

一、Flink DataSetUtils常用APIselfval self: DataSet[T]Data Set获取DataSet本身。执行程序：//1.创建一个 DataSet其元素为String类型val input: DataSet[String] = benv.fromElements("A", "B", "C", "D", "E", "F")//2.获取input本身val s=in
复制链接

扫一扫

专栏目录