Lambda 表达式

最新推荐文章于 2023-06-14 09:19:05 发布

千山我独行_不需相送

最新推荐文章于 2023-06-14 09:19:05 发布

阅读量300

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/xinxin_zhang/article/details/52495483

版权

spark 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

Passing Functions to Spark

Spark’s API relies heavily on passing functions in the driver program to run on the cluster. In Java, functions are represented by classes implementing the interfaces in the org.apache.spark.api.java.function package. There are two ways to create such functions:

Implement the Function interfaces in your own class, either as an anonymous inner class or a named one, and pass an instance of it to Spark.
In Java 8, use lambda expressions to concisely define an implementation.

While much of this guide uses lambda syntax for conciseness, it is easy to use all the same APIs in long-form. For example, we could have written our code above as follows:

JavaRDD<String> lines = sc.textFile("data.txt");
JavaRDD<Integer> lineLengths = lines.map(new Function<String, Integer>() {
  public Integer call(String s) { return s.length(); }
});
int totalLength = lineLengths.reduce(new Function2<Integer, Integer, Integer>() {
  public Integer call(Integer a, Integer b) { return a + b; }
});

Or, if writing the functions inline is unwieldy:

class GetLength implements Function<String, Integer> {
  public Integer call(String s) { return s.length(); }
}
class Sum implements Function2<Integer, Integer, Integer> {
  public Integer call(Integer a, Integer b) { return a + b; }
}

JavaRDD<String> lines = sc.textFile("data.txt");
JavaRDD<Integer> lineLengths = lines.map(new GetLength());
int totalLength = lineLengths.reduce(new Sum());

Note that anonymous inner classes in Java can also access variables in the enclosing scope as long as they are marked final. Spark will ship copies of these variables to each worker node as it does for other languages.

千山我独行_不需相送

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lambda 表达式

Passing Functions to SparkScalaJavaPythonSpark’s API relies heavily on passing functions in the driver program to run on the cluster. In Java, functions are represented by classes implementi
复制链接

扫一扫