spark学习:java版JavaRDD与JavaPairRDD的互相转换

1。引发:做一个java读取hbase的注册成表的程序。但是读出来的是javaPairRDD,而网上都是javaRDD转成dataFrame,我只能自己摸索怎么转成javaRDD 
2。方法 
  JavaRDD => JavaPairRDD: 通过mapToPair函数 
  JavaPairRDD => JavaRDD: 通过map函数转换 
3。不管其他先运行一下

package com.lcc.spark.rdd.test;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.storage.StorageLevel;

import com.google.common.base.Optional;

import scala.Tuple2;
import shapeless.Tuple;  

/** 
 * Hello world! 
 * 
 */  
public class App {  
    public static void main(String[] args) {  
        SparkConf conf = new SparkConf().setMaster("local").setAppName("Simple Application");  
        JavaSparkContext sc = new JavaSparkContext(conf);  

        // convert from other RDD  
        JavaRDD<String> line1 = sc.parallelize(Arrays.asList("1 aa", "2 bb", "4 cc", "3 dd")); 

        line1.foreach(new VoidFunction<String>(){

            @Override
            public void call(String num) throws Exception {
                // TODO Auto-generated method stub
                System.out.println("numbers;"+num);
            }
        });

        JavaPairRDD<String, String> prdd = line1.mapToPair(new PairFunction<String, String, String>() {  
            public Tuple2<String, String> call(String x) throws Exception {  
                return new Tuple2(x.split(" ")[0], x.split(" ")[1]);  
            }  
        });  
        System.out.println("111111111111mapToPair:");  
        prdd.foreach(new VoidFunction<Tuple2<String, String>>() {  
            public void call(Tuple2<String, String> x) throws Exception {  
                System.out.println(x);  
            }  
        });  

       /* JavaRDD => JavaPairRDD: 通过mapToPair函数
        JavaPairRDD => JavaRDD: 通过map函数转换*/

        System.out.println("===============1=========");  
        JavaRDD<String> javaprdd =prdd.map(new Function<Tuple2<String,String>,String>() {
            private static final long serialVersionUID = 1L;
            @Override
            public String call(Tuple2<String, String> arg0)  {
            // TODO Auto-generated method stub
                System.out.println("arg0======================"+arg0);
                System.out.println("arg0======================"+arg0._1);
                 return arg0._1+" "+arg0._2;
            }

       });

        System.out.println("===============2=========");  

        javaprdd.foreach(new VoidFunction<String>(){

            @Override
            public void call(String num) throws Exception {
                // TODO Auto-generated method stub
                System.out.println("numbers;"+num);
            }
        });

    }  
}  

结果如下

numbers;1 aa
numbers;2 bb
numbers;4 cc
numbers;3 dd

111111111111mapToPair:
(1,aa)
(2,bb)
(4,cc)
(3,dd)

===============1=========
===============2=========
arg0======================(1,aa)
arg0======================1
numbers;1 aa
arg0======================(2,bb)
arg0======================2
numbers;2 bb
arg0======================(4,cc)
arg0======================4
numbers;4 cc
arg0======================(3,dd)
arg0======================3
numbers;3 dd

4。分析

JavaPairRDD<String, String> prdd = line1.mapToPair(new PairFunction<要返回JavaPairRDD的key类型, 要返回JavaPairRDD的value类型, javaRDD要输入的类型>() {  
            public Tuple2<要返回JavaPairRDD的key类型, 要返回JavaPairRDD的value类型> call(javaRDD要输入的类型 x) throws Exception {  
                return new Tuple2(x.split(" ")[0], x.split(" ")[1]);  
            }  
        });  

JavaPairRDD<String, String> prdd = line1.mapToPair(new PairFunction<String, String, String>() {  
            public Tuple2<String, String> call(String x) throws Exception {  
                return new Tuple2(x.split(" ")[0], x.split(" ")[1]);  
            }  
        });  
 JavaRDD<要返回JavaRDD的类型> javaprdd =prdd.map(new Function<Tuple2<JavaPairRDD的key类型,JavaPairRDD的value类型>,要返回JavaRDD的类型>() {
            private static final long serialVersionUID = 1L;
            @Override
            public String call(Tuple2<String, String> arg0)  {
            // TODO Auto-generated method stub
                System.out.println("arg0======================"+arg0);
                System.out.println("arg0======================"+arg0._1);
                 return arg0._1+" "+arg0._2;
            }

       });

        System.out.println("===============1=========");  

 JavaRDD<String> javaprdd =prdd.map(new Function<Tuple2<String,String>,String>() {
            private static final long serialVersionUID = 1L;
            @Override
            public String call(Tuple2<String, String> arg0)  {
            // TODO Auto-generated method stub
                System.out.println("arg0======================"+arg0);
                System.out.println("arg0======================"+arg0._1);
                 return arg0._1+" "+arg0._2;
            }

       });

        System.out.println("===============2=========");  

从输出结果看

===============1=========
===============2=========
比下面的map函数体的内容先输出,我在这里坑了很久,
明明map函数写对了,但是我却没在下面打印输出,
结果总是map函数体的没有执行,我就很纳闷啊,
为什么啊,为什么不对啊,也没报错啊,
但是我一打印才发现里面竟然执行了,
这时候我突然想起spark的程序都是懒加载的,
你用的时候他才会执行,所以上面两行会先打印出来。
都是泪啊
arg0======================(1,aa)
arg0======================1
numbers;1 aa

5。由此总结javaRDD怎么转换成JavaPairRDD的就,反着怎么转换回去,互为逆运算

  • 1
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值