java rdd转化为strign_Spark JavaRDD、JavaPairRDD、Dataset相互转换与打印

主要内容:

1. List转JavaRDD,打印JavaRDD

2. List转JavaRDD,JavaRDD转JavaPairRDD,打印JavaPairRDD

3. JavaRDD 转 JavaRDD

1. 先将List转为JavaRDD,再通过collect()和foreach打印JavaRDD

/***@authorYu Wanlong*/

importorg.apache.spark.SparkConf;importorg.apache.spark.api.java.JavaRDD;importorg.apache.spark.api.java.JavaSparkContext;public classReadTextToRDD {public static voidmain(String[] args) {//configure spark

SparkConf sparkConf = new SparkConf().setAppName("Read Text to RDD")

.setMaster("local[2]").set("spark.executor.memory","2g");//start a spark context

JavaSparkContext jsc = newJavaSparkContext(sparkConf);//build List

List list = Arrays.asList("a:1", "a:2", "b:1", "b:1", "c:1","d:1");//List to JavaRDD

JavaRDD javaRDD =jsc.parallelize(list);//使用collect打印JavaRDD

for (Stringstr : javaRDD.collect()) {

System.out.println(str);

}//使用foreach打印JavaRDD

javaRDD.foreach(new VoidFunction() {

@Overridepublic void call(String s) throwsException {

System.out.println(s);

}

});

}

}

a:1a:2b:1b:1c:1d:1

2.  List转JavaRDD,JavaRDD转JavaPairRDD,打印JavaPairRDD

/***@authorYu Wanlong*/

importorg.apache.spark.SparkConf;importorg.apache.spark.api.java.JavaRDD;importorg.apache.spark.api.java.JavaSparkContext;public classReadTextToRDD {public static voidmain(String[] args) {//configure spark

SparkConf sparkConf = new SparkConf().setAppName("Read Text to RDD")

.setMaster("local[2]").set("spark.executor.memory","2g");//start a spark context

JavaSparkContext jsc = newJavaSparkContext(sparkConf);//build List

List list = Arrays.asList("a:1", "a:2", "b:1", "b:1", "c:1","d:1");//List to JavaRDD

JavaRDD javaRDD =jsc.parallelize(list);//JavaRDD to JavaPairRDD

JavaPairRDD javaPairRDD =javaRDD.mapToPair(new PairFunction() {

@Overridepublic Tuple2 call(String s) throwsException {

String[] ss= s.split(":");return new Tuple2(ss[0], Integer.parseInt(ss[1]));

}

});//使用collect对JavaPairRDD打印

for (Tuple2str : javaPairRDD.collect()) {

System.out.println(str.toString());

}

}

}

(a,1)

(a,2)

(b,1)

(b,1)

(c,1)

(d,1)

在JavaRDD转为JavaPairRDD的过程中,关键点为:

第一:mapToPair函数中的PairFunction():PairFunction()

第二:由于JavaPairRDD的存储形式本是key-value形式,Tuple2 为需要返回的键值对类型,Tuple2

第三:String s,String类型为JavaRDD中的String,s代表其值

第四:return new Tuple2(ss[0], Integer.parseInt(ss[1])),此处为返回的key-value的返回结果

小结:JavaRDD在转换成JavaPairRDD的时候,实际上是对单行的数据整合成key-value形式的过程,由JavaPairRDD在进行key-value运算时效率能大大提升

3.  JavaRDD 转 JavaRDD

/***@authorYu Wanlong*/

importorg.apache.spark.sql.Row;importorg.apache.spark.SparkConf;importorg.apache.spark.sql.RowFactory;importorg.apache.spark.api.java.JavaRDD;importorg.apache.spark.api.java.JavaSparkContext;public classReadTextToRDD {public static voidmain(String[] args) {//configure spark

SparkConf sparkConf = new SparkConf().setAppName("Read Text to RDD")

.setMaster("local[2]").set("spark.executor.memory","2g");//start a spark context

JavaSparkContext jsc = newJavaSparkContext(sparkConf);//build List

List list = Arrays.asList("a:1", "a:2", "b:1", "b:1", "c:1","d:1");//List to JavaRDD

JavaRDD javaRDD =jsc.parallelize(list);//JavaRDD to JavaRDD

JavaRDD javaRDDRow = javaRDD.map(new Function() {

@Overridepublic Row call(String s) throwsException {

String[] ss= s.split(":");return RowFactory.create(ss[0], ss[1]);

}

});//打印JavaRDD

for(Row str : javaRDDRow.collect()) {

System.out.println(str.toString());

}

}

}

[a,1]

[a,2]

[b,1]

[b,1]

[c,1]

[d,1]

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值