Spark1.5.2在eclipse生成jar提交到集群运行

最新推荐文章于 2018-12-22 19:18:59 发布

KeepLearningBigData

最新推荐文章于 2018-12-22 19:18:59 发布

阅读量2.4k

点赞数

分类专栏： hadoop 文章标签： wordcount spark eclipse

本文链接：https://blog.csdn.net/xubo245/article/details/50590065

版权

hadoop 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Spark1.5.2在eclipse生成jar提交到集群运行

环境：

window7

ubuntu

spark1.5.2

1.WordCountSpark.scala代码：

//class WordCountSpark {
//  
//}

import org.apache.spark._
import SparkContext._
object WordCountSpark {
  def main(args: Array[String]) {
    if (args.length != 3 ){
      println("usage is org.test.WordCount <master> <input> <output>")
      return
    }
    val sc = new SparkContext(args(0), "WordCount",
    System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_TEST_JAR")))
    val textFile = sc.textFile(args(1))
    val result = textFile.flatMap(line => line.split("\\s+"))
        .map(word => (word, 1)).reduceByKey(_ + _)
    result.saveAsTextFile(args(2))
  }
}

2.脚本代submitJob.sh 代码：

    #!/usr/bin/env bash  
    ./spark-submit --name WordCountSpark  \
--class WordCountSpark \
--master spark://219.219.220.149:7077 \
--executor-memory 512M \
--total-executor-cores 1 WordCountSpark.jar local /input/* /output/201601262158

目录：/home/hadoop/cloud/spark-1.5.2/bin

3.将WordCountSpark.scala生成jar包，用rz上传到/home/hadoop/cloud/spark-1.5.2/bin；

数据放在HDFS的input下：

4.运行：

hadoop@Master:~/cloud/spark-1.5.2/bin$ ./submitJob.sh

运行结果：

运行记录

参考：

【1】http://bit1129.iteye.com/blog/2172164

【2】http://blog.csdn.net/ggz631047367/article/details/50185181

同理，第二份代码按照相似的操作也可行：


//class SparkWordCount {
//  
//}
//package spark.examples

//import org.apache.spark.SparkConf
//import org.apache.spark.SparkContext
//
//import org.apache.spark.SparkContext._

import org.apache.spark._
import SparkContext._

object SparkWordCount {
  def main(args: Array[String]) {

    if (args.length < 1) {
      System.err.println("Usage: <file>")
      System.exit(1)
    }

    //定义Spark运行时的配置信息
    /*
      Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.
      Most of the time, you would create a SparkConf object with `new SparkConf()`, which will load
      values from any `spark.*` Java system properties set in your application as well. In this case,
      parameters you set directly on the `SparkConf` object take priority over system properties.
     */
    val conf = new SparkConf()
    conf.setAppName("SparkWordCount")

    //定义Spark的上下文

    /*
       Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
       cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
       Only one SparkContext may be active per JVM.  You must `stop()` the active SparkContext before
       creating a new one.  This limitation may eventually be removed; see SPARK-2243 for more details.
  
       @param config a Spark Config object describing the application configuration. Any settings in this config overrides the default configs as well as system properties.
    */

    val sc = new SparkContext(conf)

    ///从HDFS中获取文本(没有实际的进行读取)，构造MappedRDD
    val rdd = sc.textFile(args(0))

    //此行如果报value reduceByKey is not a member of org.apache.spark.rdd.RDD[(String, Int)]，则需要import org.apache.spark.SparkContext._
    rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile(args(1))

    sc.stop
  }
}

脚本：

    #!/usr/bin/env bash  
    ./spark-submit --name SparkWordCount  \
--class SparkWordCount \
--master spark://219.219.220.149:7077 \
--executor-memory 512M \
--total-executor-cores 1 SparkWordCount.jar /input/*  /output/201601262211

执行：

hadoop@Master:~/cloud/spark-1.5.2/bin$ ./submitJob_SparkWordCount.sh

KeepLearningBigData

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录