Spark1.5.2在eclipse生成jar提交到集群运行

Spark1.5.2在eclipse生成jar提交到集群运行

环境:

window7

ubuntu

spark1.5.2


1.WordCountSpark.scala代码:

//class WordCountSpark {
//  
//}

import org.apache.spark._
import SparkContext._
object WordCountSpark {
  def main(args: Array[String]) {
    if (args.length != 3 ){
      println("usage is org.test.WordCount <master> <input> <output>")
      return
    }
    val sc = new SparkContext(args(0), "WordCount",
    System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_TEST_JAR")))
    val textFile = sc.textFile(args(1))
    val result = textFile.flatMap(line => line.split("\\s+"))
        .map(word => (word, 1)).reduceByKey(_ + _)
    result.saveAsTextFile(args(2))
  }
}


2.脚本代submitJob.sh 代码:

    #!/usr/bin/env bash  
    ./spark-submit --name WordCountSpark  \
--class WordCountSpark \
--master spark://219.219.220.149:7077 \
--executor-memory 512M \
--total-executor-cores 1 WordCountSpark.jar local /input/* /output/201601262158

目录:/home/hadoop/cloud/spark-1.5.2/bin

3.将WordCountSpark.scala生成jar包,用rz上传到/home/hadoop/cloud/spark-1.5.2/bin;

数据放在HDFS的input下:



4.运行:

hadoop@Master:~/cloud/spark-1.5.2/bin$ ./submitJob.sh 

运行结果:




运行记录



参考:

【1】http://bit1129.iteye.com/blog/2172164

【2】http://blog.csdn.net/ggz631047367/article/details/50185181




同理,第二份代码按照相似的操作也可行:


//class SparkWordCount {
//  
//}
//package spark.examples

//import org.apache.spark.SparkConf
//import org.apache.spark.SparkContext
//
//import org.apache.spark.SparkContext._

import org.apache.spark._
import SparkContext._

object SparkWordCount {
  def main(args: Array[String]) {

    if (args.length < 1) {
      System.err.println("Usage: <file>")
      System.exit(1)
    }

    //定义Spark运行时的配置信息
    /*
      Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.
      Most of the time, you would create a SparkConf object with `new SparkConf()`, which will load
      values from any `spark.*` Java system properties set in your application as well. In this case,
      parameters you set directly on the `SparkConf` object take priority over system properties.
     */
    val conf = new SparkConf()
    conf.setAppName("SparkWordCount")

    //定义Spark的上下文

    /*
       Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
       cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
       Only one SparkContext may be active per JVM.  You must `stop()` the active SparkContext before
       creating a new one.  This limitation may eventually be removed; see SPARK-2243 for more details.
  
       @param config a Spark Config object describing the application configuration. Any settings in this config overrides the default configs as well as system properties.
    */

    val sc = new SparkContext(conf)

    ///从HDFS中获取文本(没有实际的进行读取),构造MappedRDD
    val rdd = sc.textFile(args(0))

    //此行如果报value reduceByKey is not a member of org.apache.spark.rdd.RDD[(String, Int)],则需要import org.apache.spark.SparkContext._
    rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile(args(1))

    sc.stop
  }
}

脚本:

    #!/usr/bin/env bash  
    ./spark-submit --name SparkWordCount  \
--class SparkWordCount \
--master spark://219.219.220.149:7077 \
--executor-memory 512M \
--total-executor-cores 1 SparkWordCount.jar /input/*  /output/201601262211



执行:

hadoop@Master:~/cloud/spark-1.5.2/bin$ ./submitJob_SparkWordCount.sh 





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值