Spark案例之WordCount

创建 Maven 项目

增加 Scala 插件

Spark Scala 语言开发的,所以本课件接下来的开发所使用的语言也为 Scala ,咱们当前使用的 Spark 版本为 3.0.0 ,默认采用的 Scala 编译版本为 2.12 ,所以后续开发时。我们依然采用这个版本。开发前请保证 IDEA 开发工具中含有 Scala 开发插件

增加依赖关系

修改 Maven 项目中的 POM 文件,增加 Spark 框架的依赖关系。本次基于 Spark3.0
本,使用时请注意对应版本。
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.0.0</version>
        </dependency>
    </dependencies>

    <build>
    <plugins>
    <!-- 该插件用于将 Scala 代码编译成 class 文件 -->
    <plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.2.2</version>
    <executions>
    <execution>
        <!-- 声明绑定到 maven 的 compile 阶段 -->
        <goals>
            <goal>testCompile</goal>
        </goals>
    </execution>
    </executions>
    </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.2.1</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
    </build>

WordCount代码

为了能直观地感受 Spark 框架的效果,接下来我们实现一个大数据学科中最常见的教学
案例 WordCount
package com.muzili.applications

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object wordcount {
  def main(args: Array[String]): Unit = {

    // 创建 Spark 运行配置对象
    val sparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
    // 创建 Spark 上下文环境对象(连接对象)
    val sc : SparkContext = new SparkContext(sparkConf)
    // 读取文件数据
    val fileRDD: RDD[String] = sc.textFile("C:\\Users\\muzili\\Desktop/word.txt")
    // 将文件中的数据进行分词
    val wordRDD: RDD[String] = fileRDD.flatMap( _.split(" ") )
    // 转换数据结构 word => (word, 1)
    val word2OneRDD: RDD[(String, Int)] = wordRDD.map((_,1))
    // 将转换结构后的数据按照相同的单词进行分组聚合
    val word2CountRDD: RDD[(String, Int)] = word2OneRDD.reduceByKey(_+_)
    // 将数据聚合结果采集到内存中
    val word2Count: Array[(String, Int)] = word2CountRDD.collect()
    // 打印结果
    word2Count.foreach(println)
    //关闭 Spark 连接
    sc.stop()

  }

}
并在桌面创建文件word.txt:
hello scala
hello spark
hello hadoop
hello flink
执行过程中,会产生大量的执行日志,日志见下文打印日志一,如果为了能够更好的查看程序的执行结果,可以在项目的 resources 目录中创建 log4j.properties 文件,并添加日志配置信息:
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd 
HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to ERROR. When running the spark-shell,
the
# log level for this class is used to overwrite the root logger's log level, so
that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=ERROR
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=ERROR
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent
UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

打印日志一:

D:\developer_tools\Java\jdk1.8.0_251\bin\java.exe "-javaagent:D:\developer_tools\IntelliJ IDEA\IntelliJ IDEA 2020.1.1\lib\idea_rt.jar=51489:D:\developer_tools\IntelliJ IDEA\IntelliJ IDEA 2020.1.1\bin" -Dfile.encoding=UTF-8 -classpath D:\developer_tools\Java\jdk1.8.0_251\jre\lib\charsets.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\deploy.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\access-bridge-64.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\cldrdata.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\dnsns.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\jaccess.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\jfxrt.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\localedata.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\nashorn.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunec.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunjce_provider.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunmscapi.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunpkcs11.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\zipfs.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\javaws.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jce.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jfr.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jfxswt.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jsse.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\management-agent.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\plugin.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\resources.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\rt.jar;D:\code\code01\spark_test\spark_core\target\classes;D:\developer_tools\Scala\lib\scala-library.jar;D:\developer_tools\Scala\lib\scala-reflect.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-core_2.11\2.4.4\spark-core_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\com\thoughtworks\paranamer\paranamer\2.8\paranamer-2.8.jar;D:\developer_tools\Maven\repository\org\apache\avro\avro\1.8.2\avro-1.8.2.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-compress\1.8.1\commons-compress-1.8.1.jar;D:\developer_tools\Maven\repository\org\tukaani\xz\1.5\xz-1.5.jar;D:\developer_tools\Maven\repository\org\apache\avro\avro-mapred\1.8.2\avro-mapred-1.8.2-hadoop2.jar;D:\developer_tools\Maven\repository\org\apache\avro\avro-ipc\1.8.2\avro-ipc-1.8.2.jar;D:\developer_tools\Maven\repository\commons-codec\commons-codec\1.9\commons-codec-1.9.jar;D:\developer_tools\Maven\repository\com\twitter\chill_2.11\0.9.3\chill_2.11-0.9.3.jar;D:\developer_tools\Maven\repository\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;D:\developer_tools\Maven\repository\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;D:\developer_tools\Maven\repository\org\objenesis\objenesis\2.5.1\objenesis-2.5.1.jar;D:\developer_tools\Maven\repository\com\twitter\chill-java\0.9.3\chill-java-0.9.3.jar;D:\developer_tools\Maven\repository\org\apache\xbean\xbean-asm6-shaded\4.8\xbean-asm6-shaded-4.8.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-client\2.6.5\hadoop-client-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-common\2.6.5\hadoop-common-2.6.5.jar;D:\developer_tools\Maven\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;D:\developer_tools\Maven\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;D:\developer_tools\Maven\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;D:\developer_tools\Maven\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;D:\developer_tools\Maven\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;D:\developer_tools\Maven\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;D:\developer_tools\Maven\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;D:\developer_tools\Maven\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;D:\developer_tools\Maven\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;D:\developer_tools\Maven\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;D:\developer_tools\Maven\repository\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-auth\2.6.5\hadoop-auth-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\httpcomponents\httpclient\4.2.5\httpclient-4.2.5.jar;D:\developer_tools\Maven\repository\org\apache\httpcomponents\httpcore\4.2.4\httpcore-4.2.4.jar;D:\developer_tools\Maven\repository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;D:\developer_tools\Maven\repository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;D:\developer_tools\Maven\repository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;D:\developer_tools\Maven\repository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;D:\developer_tools\Maven\repository\org\apache\curator\curator-client\2.6.0\curator-client-2.6.0.jar;D:\developer_tools\Maven\repository\org\htrace\htrace-core\3.0.4\htrace-core-3.0.4.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-hdfs\2.6.5\hadoop-hdfs-2.6.5.jar;D:\developer_tools\Maven\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;D:\developer_tools\Maven\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;D:\developer_tools\Maven\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.6.5\hadoop-mapreduce-client-app-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.6.5\hadoop-mapreduce-client-common-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-client\2.6.5\hadoop-yarn-client-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-server-common\2.6.5\hadoop-yarn-server-common-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.6.5\hadoop-mapreduce-client-shuffle-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-api\2.6.5\hadoop-yarn-api-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.6.5\hadoop-mapreduce-client-core-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-common\2.6.5\hadoop-yarn-common-2.6.5.jar;D:\developer_tools\Maven\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;D:\developer_tools\Maven\repository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-xc\1.9.13\jackson-xc-1.9.13.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.6.5\hadoop-mapreduce-client-jobclient-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-annotations\2.6.5\hadoop-annotations-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-launcher_2.11\2.4.4\spark-launcher_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-kvstore_2.11\2.4.4\spark-kvstore_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\core\jackson-core\2.6.7\jackson-core-2.6.7.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\core\jackson-annotations\2.6.7\jackson-annotations-2.6.7.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-network-common_2.11\2.4.4\spark-network-common_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-network-shuffle_2.11\2.4.4\spark-network-shuffle_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-unsafe_2.11\2.4.4\spark-unsafe_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\javax\activation\activation\1.1.1\activation-1.1.1.jar;D:\developer_tools\Maven\repository\org\apache\curator\curator-recipes\2.6.0\curator-recipes-2.6.0.jar;D:\developer_tools\Maven\repository\org\apache\curator\curator-framework\2.6.0\curator-framework-2.6.0.jar;D:\developer_tools\Maven\repository\com\google\guava\guava\16.0.1\guava-16.0.1.jar;D:\developer_tools\Maven\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;D:\developer_tools\Maven\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-lang3\3.5\commons-lang3-3.5.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;D:\developer_tools\Maven\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;D:\developer_tools\Maven\repository\org\slf4j\slf4j-api\1.7.16\slf4j-api-1.7.16.jar;D:\developer_tools\Maven\repository\org\slf4j\jul-to-slf4j\1.7.16\jul-to-slf4j-1.7.16.jar;D:\developer_tools\Maven\repository\org\slf4j\jcl-over-slf4j\1.7.16\jcl-over-slf4j-1.7.16.jar;D:\developer_tools\Maven\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;D:\developer_tools\Maven\repository\org\slf4j\slf4j-log4j12\1.7.16\slf4j-log4j12-1.7.16.jar;D:\developer_tools\Maven\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;D:\developer_tools\Maven\repository\org\xerial\snappy\snappy-java\1.1.7.3\snappy-java-1.1.7.3.jar;D:\developer_tools\Maven\repository\org\lz4\lz4-java\1.4.0\lz4-java-1.4.0.jar;D:\developer_tools\Maven\repository\com\github\luben\zstd-jni\1.3.2-2\zstd-jni-1.3.2-2.jar;D:\developer_tools\Maven\repository\org\roaringbitmap\RoaringBitmap\0.7.45\RoaringBitmap-0.7.45.jar;D:\developer_tools\Maven\repository\org\roaringbitmap\shims\0.7.45\shims-0.7.45.jar;D:\developer_tools\Maven\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;D:\developer_tools\Maven\repository\org\scala-lang\scala-library\2.11.12\scala-library-2.11.12.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-jackson_2.11\3.5.3\json4s-jackson_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-core_2.11\3.5.3\json4s-core_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-ast_2.11\3.5.3\json4s-ast_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-scalap_2.11\3.5.3\json4s-scalap_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\scala-lang\modules\scala-xml_2.11\1.0.6\scala-xml_2.11-1.0.6.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\core\jersey-client\2.22.2\jersey-client-2.22.2.jar;D:\developer_tools\Maven\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\hk2-api\2.4.0-b34\hk2-api-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\hk2-utils\2.4.0-b34\hk2-utils-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\external\aopalliance-repackaged\2.4.0-b34\aopalliance-repackaged-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\external\javax.inject\2.4.0-b34\javax.inject-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\hk2-locator\2.4.0-b34\hk2-locator-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\javassist\javassist\3.18.1-GA\javassist-3.18.1-GA.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\core\jersey-common\2.22.2\jersey-common-2.22.2.jar;D:\developer_tools\Maven\repository\javax\annotation\javax.annotation-api\1.2\javax.annotation-api-1.2.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\bundles\repackaged\jersey-guava\2.22.2\jersey-guava-2.22.2.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\osgi-resource-locator\1.0.1\osgi-resource-locator-1.0.1.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\core\jersey-server\2.22.2\jersey-server-2.22.2.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\media\jersey-media-jaxb\2.22.2\jersey-media-jaxb-2.22.2.jar;D:\developer_tools\Maven\repository\javax\validation\validation-api\1.1.0.Final\validation-api-1.1.0.Final.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\containers\jersey-container-servlet\2.22.2\jersey-container-servlet-2.22.2.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\containers\jersey-container-servlet-core\2.22.2\jersey-container-servlet-core-2.22.2.jar;D:\developer_tools\Maven\repository\io\netty\netty-all\4.1.17.Final\netty-all-4.1.17.Final.jar;D:\developer_tools\Maven\repository\io\netty\netty\3.9.9.Final\netty-3.9.9.Final.jar;D:\developer_tools\Maven\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-core\3.1.5\metrics-core-3.1.5.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-jvm\3.1.5\metrics-jvm-3.1.5.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-json\3.1.5\metrics-json-3.1.5.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-graphite\3.1.5\metrics-graphite-3.1.5.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\core\jackson-databind\2.6.7.1\jackson-databind-2.6.7.1.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\module\jackson-module-scala_2.11\2.6.7.1\jackson-module-scala_2.11-2.6.7.1.jar;D:\developer_tools\Maven\repository\org\scala-lang\scala-reflect\2.11.8\scala-reflect-2.11.8.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\module\jackson-module-paranamer\2.7.9\jackson-module-paranamer-2.7.9.jar;D:\developer_tools\Maven\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;D:\developer_tools\Maven\repository\oro\oro\2.0.8\oro-2.0.8.jar;D:\developer_tools\Maven\repository\net\razorvine\pyrolite\4.13\pyrolite-4.13.jar;D:\developer_tools\Maven\repository\net\sf\py4j\py4j\0.10.7\py4j-0.10.7.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-tags_2.11\2.4.4\spark-tags_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-crypto\1.0.0\commons-crypto-1.0.0.jar;D:\developer_tools\Maven\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar com.sibat.applications.wordcount
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/13 17:44:17 INFO SparkContext: Running Spark version 2.4.4
21/10/13 17:44:17 INFO SparkContext: Submitted application: WordCount
21/10/13 17:44:17 INFO SecurityManager: Changing view acls to: muzili
21/10/13 17:44:17 INFO SecurityManager: Changing modify acls to: muzili
21/10/13 17:44:17 INFO SecurityManager: Changing view acls groups to: 
21/10/13 17:44:17 INFO SecurityManager: Changing modify acls groups to: 
21/10/13 17:44:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(muzili); groups with view permissions: Set(); users  with modify permissions: Set(muzili); groups with modify permissions: Set()
21/10/13 17:44:18 INFO Utils: Successfully started service 'sparkDriver' on port 51527.
21/10/13 17:44:19 INFO SparkEnv: Registering MapOutputTracker
21/10/13 17:44:19 INFO SparkEnv: Registering BlockManagerMaster
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/10/13 17:44:19 INFO DiskBlockManager: Created local directory at C:\Users\muzili\AppData\Local\Temp\blockmgr-a6f2f260-7970-4a16-82b5-93d659f2c49f
21/10/13 17:44:19 INFO MemoryStore: MemoryStore started with capacity 1975.8 MB
21/10/13 17:44:19 INFO SparkEnv: Registering OutputCommitCoordinator
21/10/13 17:44:19 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/10/13 17:44:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://LAPTOP-R0NFMTAH:4040
21/10/13 17:44:19 INFO Executor: Starting executor ID driver on host localhost
21/10/13 17:44:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51568.
21/10/13 17:44:19 INFO NettyBlockTransferService: Server created on LAPTOP-R0NFMTAH:51568
21/10/13 17:44:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/10/13 17:44:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Registering block manager LAPTOP-R0NFMTAH:51568 with 1975.8 MB RAM, BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 1975.6 MB)
21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 1975.6 MB)
21/10/13 17:44:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 20.4 KB, free: 1975.8 MB)
21/10/13 17:44:19 INFO SparkContext: Created broadcast 0 from textFile at wordcount.scala:14
21/10/13 17:44:19 INFO FileInputFormat: Total input paths to process : 1
21/10/13 17:44:19 INFO SparkContext: Starting job: collect at wordcount.scala:22
21/10/13 17:44:20 INFO DAGScheduler: Registering RDD 3 (map at wordcount.scala:18)
21/10/13 17:44:20 INFO DAGScheduler: Got job 0 (collect at wordcount.scala:22) with 2 output partitions
21/10/13 17:44:20 INFO DAGScheduler: Final stage: ResultStage 1 (collect at wordcount.scala:22)
21/10/13 17:44:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
21/10/13 17:44:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
21/10/13 17:44:20 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18), which has no missing parents
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2.9 KB, free: 1975.8 MB)
21/10/13 17:44:20 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18) (first 15 tasks are for partitions Vector(0, 1))
21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7887 bytes)
21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7887 bytes)
21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:0+25
21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:25+26
21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1157 bytes result sent to driver
21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1157 bytes result sent to driver
21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 165 ms on localhost (executor driver) (1/2)
21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 181 ms on localhost (executor driver) (2/2)
21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
21/10/13 17:44:20 INFO DAGScheduler: ShuffleMapStage 0 (map at wordcount.scala:18) finished in 0.256 s
21/10/13 17:44:20 INFO DAGScheduler: looking for newly runnable stages
21/10/13 17:44:20 INFO DAGScheduler: running: Set()
21/10/13 17:44:20 INFO DAGScheduler: waiting: Set(ResultStage 1)
21/10/13 17:44:20 INFO DAGScheduler: failed: Set()
21/10/13 17:44:20 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20), which has no missing parents
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2029.0 B, free 1975.6 MB)
21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2029.0 B, free: 1975.8 MB)
21/10/13 17:44:20 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161
21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20) (first 15 tasks are for partitions Vector(0, 1))
21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 7662 bytes)
21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1284 bytes result sent to driver
21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1261 bytes result sent to driver
21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 60 ms on localhost (executor driver) (1/2)
21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 58 ms on localhost (executor driver) (2/2)
21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
21/10/13 17:44:20 INFO DAGScheduler: ResultStage 1 (collect at wordcount.scala:22) finished in 0.070 s
21/10/13 17:44:20 INFO DAGScheduler: Job 0 finished: collect at wordcount.scala:22, took 0.561218 s
21/10/13 17:44:20 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-R0NFMTAH:4040
(scala,1)
(flink,1)
(hello,4)
(spark,1)
(hadoop,1)
21/10/13 17:44:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/10/13 17:44:20 INFO MemoryStore: MemoryStore cleared
21/10/13 17:44:20 INFO BlockManager: BlockManager stopped
21/10/13 17:44:20 INFO BlockManagerMaster: BlockManagerMaster stopped
21/10/13 17:44:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/10/13 17:44:20 INFO SparkContext: Successfully stopped SparkContext
21/10/13 17:44:20 INFO ShutdownHookManager: Shutdown hook called
21/10/13 17:44:20 INFO ShutdownHookManager: Deleting directory C:\Users\muzili\AppData\Local\Temp\spark-74be267d-556a-40a4-9253-55fc0a910290

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

大数据翻身

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值