spark将rdd转为string_spark实现一个wordCount程序

最新推荐文章于 2022-07-22 13:17:03 发布

Yrgo

最新推荐文章于 2022-07-22 13:17:03 发布

阅读量125

点赞数

文章标签： spark将rdd转为string

本文链接：https://blog.csdn.net/weixin_36141019/article/details/112499822

版权

废话不多说，直接上代码

pom.xml文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.cn.spark</groupId>
  <artifactId>spark</artifactId>
  <version>1.0</version>
  <properties>
    <scala.version>2.11.8</scala.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
      <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.3.3</version>
      </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
            <configuration>
              <args>
                <arg>-dependencyfile</arg>
                <arg>${project.build.directory}/.scala_dependencies</arg>
              </args>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass></mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

wordCount.scala

package com.cn.spark.wordcount

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object wordCount {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("test01").setMaster("local[2]")
    val sc = new SparkContext(conf)

    sc.setLogLevel("warn")

    val data:RDD[String] = sc.textFile("/Users/mac/IdeaProjects/spark/src/resources/name.text")
    val words:RDD[String] = data.flatMap(x=>x.split(" "))
    val wordAndOne:RDD[(String,Int)] = words.map(x=>(x,1))
    val result:RDD[(String,Int)] = wordAndOne.reduceByKey((x,y)=>x+y)
    val sortedRDD:RDD[(String,Int)] = result.sortBy(x=>x._2,false)
    val finalResult:Array[(String,Int)] = sortedRDD.collect()
    finalResult.foreach(println)
  }
}

总结

本代码采用Scala编写，点击关注不迷路哦～