spark中的累加器accumulator

最新推荐文章于 2024-07-28 23:55:21 发布

weixin_34148456

最新推荐文章于 2024-07-28 23:55:21 发布

阅读量65

点赞数

文章标签：大数据 python

原文链接：https://my.oschina.net/forrest420/blog/398412

版权

2019独角兽企业重金招聘Python工程师标准>>>

A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged, i.e. variables that are only "added" to through an associative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric value types, and programmers can add support for new types.

An accumulator is created from an initial value v by calling SparkContext#accumulator. Tasks running on the cluster can then add to it using the Accumulable#+= operator. However, they cannot read its value. Only the driver program can read the accumulator's value, using its value method.

The interpreter session below shows an accumulator being used to add up the elements of an array:

scala> val accum = sc.accumulator(0)
accum: spark.Accumulator[Int] = 0scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
...10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s

scala> accum.value
res2: Int = 10