Spark代码1之RDDparallelizeSaveAsFile

更多代码请见:https://github.com/xubo245/SparkLearning
 
Spark代码1之RDDparallelizeSaveAsFile

主要功能:

1.并行生成n个随机数并对其进行统计并排序,最后存到HDFS

2.计算和存储两部分分别计时


代码:

package LocalSpark

/**
  * Created by xubo on 2016/3/3.
  */
import org.apache.spark._
//import java.util._;
import scala.util.Random
import java.text.SimpleDateFormat
import java.util.Date
import scala.math._
object RDDparallelizeSaveAsFile {
   def main(args:Array[String]) {
//     val conf = new SparkConf().setAppName("RDDparallelize").setMaster("local")
     val conf = new SparkConf().setAppName("RDDparallelize").setMaster("spark://Master:7077")
     val spark = new SparkContext(conf)
//     val startTime=
//     val iString=new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date() );
     var startTime=System.currentTimeMillis();
     var ar1=if(args.length>0) args(0).toInt else 10000000
         ar1=min(ar1,Int.MaxValue)
     val data=spark.parallelize(1 to ar1).map(num=>(new Random()).nextInt(1000)).map(num=>(num,1)).reduceByKey(_+_).sortBy(_._2 )
//     spark.parallelize(1 to 10000000).map(num=>(new Random()).nextInt(1000)).map(num=>(num,1)).reduceByKey(_+_).sortBy(_._2 ).foreach(println)
//     for(i<-1 to 1000) println( (new Random()).nextInt(1000))
    var endTime=System.currentTimeMillis();
     println("compute:"+(endTime-startTime)+"ms")
     val iString=new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date() )
     val soutput="hdfs://Master:9000/output/"+iString;
     println(soutput)
     startTime=System.currentTimeMillis();
     data.saveAsTextFile(soutput)
     endTime=System.currentTimeMillis();
     println("saveAsTextFile:"+(endTime-startTime)+"ms")
     spark.stop()
   }
}

脚本:

    #!/usr/bin/env bash  
    spark-submit --name RDDparallelizeSaveAsFile  \
--class LocalSpark.RDDparallelizeSaveAsFile \
--master spark://Master:7077 \
--executor-memory 512M \
--total-executor-cores 22 scala2.jar 2147483645
Master为ip地址,运行时是要用真是ip替换


最大Int为2147483645


当n为:2147483645

测试结果:

集群模式:

hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom2.sh 
compute:270435ms                                                                
hdfs://Master:9000/output/20160303203717406
saveAsTextFile:3658ms                                                           

local模式:

hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom2.sh 
compute:380ms
hdfs://Master:9000/output/20160303205225605
saveAsTextFile:472576ms

第二次:

hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom2.sh 
compute:462ms
hdfs://Master:9000/output/20160303210423094
saveAsTextFile:494303ms


明显local时间要长,为什么两个时间不一样?应该是运行前后动了代码,RDD是懒执行的,所以会在调用时才执行
部分文件代码:

(571,2143167)
(544,2143383)
(919,2143604)
(832,2143630)
(184,2143638)
(26,2143763)
(228,2143871)
(360,2143921)
(76,2143931)
(92,2144076)
(820,2144118)
(286,2144162)
(831,2144198)

当n为: 10000000

集群模式:

hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom.sh 
compute:5423ms                                                                  
hdfs://Master:9000/output/20160303204414173
saveAsTextFile:3530ms  
local:

hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom.sh 
compute:442ms
hdfs://<span style="font-family: Arial, Helvetica, sans-serif;">Master</span><span style="font-family: Arial, Helvetica, sans-serif;">:9000/output/20160303205045778</span>
saveAsTextFile:5112ms




修改后代码:

package LocalSpark

/**
  * Created by xubo on 2016/3/3.
  */
import org.apache.spark._
//import java.util._;
import scala.util.Random
import java.text.SimpleDateFormat
import java.util.Date
import scala.math._
object RDDparallelizeSaveAsFile {
   def main(args:Array[String]) {
//     val conf = new SparkConf().setAppName("RDDparallelize").setMaster("local")
     val conf = new SparkConf().setAppName("RDDparallelize").setMaster("spark://<span style="font-family: Arial, Helvetica, sans-serif;">Master</span><span style="font-family: Arial, Helvetica, sans-serif;">:7077")</span>
     val spark = new SparkContext(conf)
//     val startTime=
//     val iString=new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date() );
     var startTime=System.currentTimeMillis();
     var ar1=if(args.length>0) args(0).toInt else 10000000
     println("length:"+ar1)
         ar1=min(ar1,Int.MaxValue)
     val data=spark.parallelize(1 to ar1).map(num=>(new Random()).nextInt(1000)).map(num=>(num,1)).reduceByKey(_+_).sortBy(_._2 )
//     spark.parallelize(1 to 10000000).map(num=>(new Random()).nextInt(1000)).map(num=>(num,1)).reduceByKey(_+_).sortBy(_._2 ).foreach(println)
//     for(i<-1 to 1000) println( (new Random()).nextInt(1000))
        println(data.count())
    var endTime=System.currentTimeMillis();
     println("compute:"+(endTime-startTime)+"ms")
     val iString=new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date() )
     val soutput="hdfs://<span style="font-family: Arial, Helvetica, sans-serif;">Master</span><span style="font-family: Arial, Helvetica, sans-serif;">:9000/output/"+iString;</span>
     println(soutput)
     startTime=System.currentTimeMillis();
     data.saveAsTextFile(soutput)
     endTime=System.currentTimeMillis();
     println("saveAsTextFile:"+(endTime-startTime)+"ms")
     spark.stop()
   }
}




运行结果:

hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom.sh 
length:10000000
1000
compute:4361ms
hdfs://MAster:9000/output/20160303211526342
saveAsTextFile:1229ms
hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom2.sh 
length:2147483645
1000
compute:464324ms
hdfs://<span style="font-family: Arial, Helvetica, sans-serif;">MAster</span><span style="font-family: Arial, Helvetica, sans-serif;">:9000/output/20160303212342077</span>
saveAsTextFile:1237ms
hadoop@Master:~/cloud/testByXubo/spark/testRandom$ ./submitJobTestRandom2.sh 
length:2147483645
1000                                                                            
compute:274269ms
hdfs://Master:9000/output/20160303214238474
saveAsTextFile:2662ms 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值