基于eclipse maven 开发 spark 集群计算

11 篇文章 0 订阅

1. 根据前面的文章,搭建好spark on yarn的集群,即hadoop和spark均搭建成功

/usr/local/hadoop/sbin/start-all.sh

启动hadoop yarn

6661 NameNode
7163 ResourceManager
7300 NodeManager
7012 SecondaryNameNode
3119 
7512 Jps
6795 DataNode

2. 打开eclipse,创建maven项目

3.修改pom.xml 增加jar包依赖

 <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.6.0</version>
      <scope>provided</scope>
 </dependency>
4. 点击run---as---maven install

此时会下载依赖的jar包

5.在app.java主类中调用spark

package com.fei.simple_project;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

/**
 * Hello world!
 *
 */
public class App 
{
    public static void main( String[] args )
    {
    	String logFile = "README.md";
        SparkConf conf = new SparkConf().setAppName("Simple Application");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> logData = sc.textFile(logFile).cache();

        long numAs = logData.filter(new Function<String, Boolean>() {
          public Boolean call(String s) { return s.contains("a"); }
        }).count();

        long numBs = logData.filter(new Function<String, Boolean>() {
          public Boolean call(String s) { return s.contains("b"); }
        }).count();
    	
        System.out.println( "Hello World!" );
        System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
    }
}


6. run---as---install

将会在target目录下生成jar包

7. 运行spark.sh

/usr/local/spark/bin/spark-submit --class "com.fei.simple_project.App" --master local[4] /home/tizen/share/working-dir/spark/simple-project/target/simple-project-0.0.1-SNAPSHOT.jar

8. 查看执行效果

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://namenode:9000/user/tizen/README.md
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)

说明我们还没有把README.md文件上传到集群中去

9. hdfs dfs -put README.md README.md

10 执行spark.sh

16/01/23 21:49:23 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/01/23 21:49:23 INFO DAGScheduler: ResultStage 1 (count at App.java:25) finished in 0.047 s
16/01/23 21:49:23 INFO DAGScheduler: Job 1 finished: count at App.java:25, took 0.155340 s
Hello World!
Lines with a: 58, lines with b: 26
16/01/23 21:49:23 INFO SparkContext: Invoking stop() from shutdown hook
16/01/23 21:49:23 INFO SparkUI: Stopped Spark web UI at http://192.168.0.101:4040
16/01/23 21:49:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
可以看到结果出来了

查看url

http://192.168.0.101:4040

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值