spark应用程序开发

应用程序开发
1.将spark的jar加入到项目的lib中,并加入到项目的classpath中
依赖spark-core
<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.0.2</version>
</dependency>

如果操作hdfs的话,还依赖hdfs
<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.2.0</version>
</dependency>


或者
将/usr/local/myspark/spark/spark-1.0.2-bin-hadoop2/lib下的jarr加入到项目的lib中,并加入到项目的classpath中




2.代码实例
package org.test.myspark;

import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

public class SparkWordCount {

 /**
  * 单词统计
  */
 public static void main(String[] args) {
  SparkConf conf=new SparkConf().setAppName("spark_wordcount").setMaster("yarn-cluster");
  JavaSparkContext jsc=new JavaSparkContext(conf);
  //将文件内容读取为一行一行的字符串
  JavaRDD<String> lines=jsc.textFile("hdfs://master:9000/wordcount_input/file2");
  //将每一行内容转换为一个一个的word
  JavaRDD<String> words=lines.flatMap(new FlatMapFunction<String,String>(){
   @Override
   public Iterable<String> call(String s) throws Exception {
    System.out.println("line="+s);
    String[] linewords= s.split(" ");
    for(String lw:linewords){
      System.out.println("words="+lw);
    }
    return Arrays.asList(linewords);
   }
  });
  //将每一个word计数1
  //map to pair
  JavaPairRDD<String,Integer> wordonepairs=words.mapToPair(new PairFunction<String,String,Integer>(){
   @Override
   public Tuple2<String, Integer> call(String s) throws Exception {
    return new Tuple2<String, Integer>(s,1);
   }
   
  });
  //对每个单词进行计数统计
  //action
  JavaPairRDD<String,Integer> wordcounts=wordonepairs.reduceByKey(new Function2<Integer,Integer,Integer>(){
   @Override
   public Integer call(Integer a, Integer b) throws Exception {
    return a+b;
   }
  });
  //获取结果
     List<Tuple2<String,Integer>> results=wordcounts.collect();
     for(Tuple2<String,Integer> tuple:results){
      System.out.println(tuple._1+":"+tuple._2);
     }
 }

}



3.打包
将项目的classes拷贝到C:\Users\dingzhf\Desktop\logs
cmd
>cd C:\Users\dingzhf\Desktop\logs\classes
>jar -cvf wordcount.jar .


将wordcount.jar拷贝到10.41.2.82的/opt目录下

4.运行
在10.41.2.82上运行以下命令:

/usr/local/myspark/spark/spark-1.0.2-bin-hadoop2/bin/spark-submit --class org.test.myspark.SparkWordCount --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 /opt/wordcount.jar 

 /usr/local/myspark/spark/spark-1.0.2-bin-hadoop2/bin/spark-submit --class org.test.myspark.SparkWordCount --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 /opt/wordcount.jar  












查看结果:
http://master:8088/proxy/application_1409622175934_0004/A





点击logs:




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值