spark-window11本地环境搭建与wordCount

【README】

安装软件清单(仅在本机调试, 不提交到集群,不写scala,可以直接跳过安装):

  1. scala;
  2. hadoop;
  3. winutils;
  4. spark;

【1】 本地环境搭建(非常简单)

1)新增 spark maven 依赖:(参见:  https://mvnrepository.com/artifact/org.apache.spark/spark-core

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>3.5.0</version>
    </dependency>

2)新增vm参数 (-Dspark.master=local)

(或者 new SparkConf().setMaster("local") 是同样的效果 )

3)wordcount例子1:

public class SparkExample01 {
    private static final String FILE_PATH = "src/main/resources/helloworld.txt";

    public static void main(String[] args) {
        SparkConf sparkCfg = new SparkConf().setMaster("local").setAppName("sparkCfg");
        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkCfg);
        JavaRDD<String> lineRdd = javaSparkContext.textFile(FILE_PATH);
        JavaRDD<String> wordRdd = lineRdd.flatMap(line -> Arrays.stream(line.split(" ")).iterator());
        JavaPairRDD<String, Integer> wordCountRdd = wordRdd.mapToPair(word -> new Tuple2<>(word, 1)).reduceByKey(Integer::sum);
        wordCountRdd.foreach(wordCount -> System.out.println(wordCount._1() + "-" + wordCount._2()));
    }
}


【2】 wordcount 例子2

public class SparkWordCounter {

    private static final String inputFileName = "D:\\temp\\spark\\input.txt";
    private static final String outputFileName = "D:\\temp\\sparkoutput\\";

    public static void main(String[] args) {
        System.setProperty("hadoop.home.dir", "D://software_install_dir//winutils-master//hadoop-3.2.0//");
        new File(outputFileName).delete();
        wordCount();
    }

    public static void wordCount() {
        SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("sparkCore3.0");
        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
        JavaRDD<String> fileRdd = javaSparkContext.textFile(inputFileName);
        JavaRDD<String> fileWordRdd = fileRdd.flatMap(line -> Arrays.asList(line.split(" ")).iterator());
        JavaPairRDD<String, Integer> fileWordPairRdd = fileWordRdd.mapToPair(word -> new Tuple2<>(word, 1)).reduceByKey((x, y) -> x + y);
        System.out.println("【打印】" + fileWordPairRdd.count());
        fileWordPairRdd.saveAsTextFile(outputFileName);
    }
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值