spark java foreach_【Spark Java API】Action(3)—foreach、foreachPartition、lookup

foreach

官方文档描述:

Applies a function f to all elements of this RDD.

函数原型:

def foreach(f: VoidFunction[T])

**

foreach用于遍历RDD,将函数f应用于每一个元素。

**

源码分析:

def foreach(f: T => Unit): Unit = withScope {

val cleanF = sc.clean(f)

sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))

}

实例:

List data = Arrays.asList(5, 1, 1, 4, 4, 2, 2);

JavaRDD javaRDD = javaSparkContext.parallelize(data,3);

javaRDD.foreach(new VoidFunction() {

@Override

public void call(Integer integer) throws Exception {

System.out.println(integer);

}

});

foreachPartition

官方文档描述:

Applies a function f to each partition of this RDD.

函数原型:

def foreachPartition(f: VoidFunction[java.util.Iterator[T]])

**

foreachPartition和foreach类似,只不过是对每一个分区使用f。

**

源码分析:

def foreachPartition(f: Iterator[T] => Unit): Unit = withScope {

val cleanF = sc.clean(f)

sc.runJob(this, (iter: Iterator[T]) => cleanF(iter))

}

实例:

List data = Arrays.asList(5, 1, 1, 4, 4, 2, 2);

JavaRDD javaRDD = javaSparkContext.parallelize(data,3);

//获得分区ID

JavaRDD partitionRDD = javaRDD.mapPartitionsWithIndex(new Function2, Iterator>() {

@Override

public Iterator call(Integer v1, Iterator v2) throws Exception {

LinkedList linkedList = new LinkedList();

while(v2.hasNext()){

linkedList.add(v1 + "=" + v2.next());

}

return linkedList.iterator();

}

},false);

System.out.println(partitionRDD.collect());

javaRDD.foreachPartition(new VoidFunction>() {

@Override

public void call(Iterator integerIterator) throws Exception {

System.out.println("___________begin_______________");

while(integerIterator.hasNext())

System.out.print(integerIterator.next() + " ");

System.out.println("\n___________end_________________");

}

});

lookup

官方文档描述:

Return the list of values in the RDD for key `key`. This operation is done efficiently

if the RDD has a known partitioner by only searching the partition that the key maps to.

函数原型:

def lookup(key: K): JList[V]

**

lookup用于(K,V)类型的RDD,指定K值,返回RDD中该K对应的所有V值。

**

源码分析:

def lookup(key: K): Seq[V] = self.withScope {

self.partitioner match {

case Some(p) =>

val index = p.getPartition(key)

val process = (it: Iterator[(K, V)]) => {

val buf = new ArrayBuffer[V]

for (pair

buf += pair._2

}

buf

} : Seq[V]

val res = self.context.runJob(self, process, Array(index), false)

res(0)

case None =>

self.filter(_._1 == key).map(_._2).collect()

}

}

**

从源码中可以看出,如果partitioner不为空,计算key得到对应的partition,在从该partition中获得key对应的所有value;如果partitioner为空,则通过filter过滤掉其他不等于key的值,然后将其value输出。

**

实例:

List data = Arrays.asList(5, 1, 1, 4, 4, 2, 2);

JavaRDD javaRDD = javaSparkContext.parallelize(data, 3);

JavaPairRDD javaPairRDD = javaRDD.mapToPair(new PairFunction() {

int i = 0;

@Override

public Tuple2 call(Integer integer) throws Exception {

i++;

return new Tuple2(integer,i + integer);

}

});

System.out.println(javaPairRDD.collect());

System.out.println("lookup------------" + javaPairRDD.lookup(4));

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值