创建 Maven 项目
增加 Scala 插件
Spark
由
Scala
语言开发的,所以本课件接下来的开发所使用的语言也为
Scala
,咱们当前使用的 Spark
版本为
3.0.0
,默认采用的
Scala
编译版本为
2.12
,所以后续开发时。我们依然采用这个版本。开发前请保证 IDEA
开发工具中含有
Scala
开发插件
增加依赖关系
修改
Maven
项目中的
POM
文件,增加
Spark
框架的依赖关系。本次基于
Spark3.0
版
本,使用时请注意对应版本。
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- 该插件用于将 Scala 代码编译成 class 文件 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<!-- 声明绑定到 maven 的 compile 阶段 -->
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
WordCount代码
为了能直观地感受
Spark
框架的效果,接下来我们实现一个大数据学科中最常见的教学
案例
WordCount
package com.muzili.applications
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object wordcount {
def main(args: Array[String]): Unit = {
// 创建 Spark 运行配置对象
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
// 创建 Spark 上下文环境对象(连接对象)
val sc : SparkContext = new SparkContext(sparkConf)
// 读取文件数据
val fileRDD: RDD[String] = sc.textFile("C:\\Users\\muzili\\Desktop/word.txt")
// 将文件中的数据进行分词
val wordRDD: RDD[String] = fileRDD.flatMap( _.split(" ") )
// 转换数据结构 word => (word, 1)
val word2OneRDD: RDD[(String, Int)] = wordRDD.map((_,1))
// 将转换结构后的数据按照相同的单词进行分组聚合
val word2CountRDD: RDD[(String, Int)] = word2OneRDD.reduceByKey(_+_)
// 将数据聚合结果采集到内存中
val word2Count: Array[(String, Int)] = word2CountRDD.collect()
// 打印结果
word2Count.foreach(println)
//关闭 Spark 连接
sc.stop()
}
}
并在桌面创建文件word.txt:
hello scala
hello spark
hello hadoop
hello flink
执行过程中,会产生大量的执行日志,日志见下文打印日志一,如果为了能够更好的查看程序的执行结果,可以在项目的 resources
目录中创建
log4j.properties
文件,并添加日志配置信息:
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to ERROR. When running the spark-shell,
the
# log level for this class is used to overwrite the root logger's log level, so
that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=ERROR
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=ERROR
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent
UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
打印日志一:
D:\developer_tools\Java\jdk1.8.0_251\bin\java.exe "-javaagent:D:\developer_tools\IntelliJ IDEA\IntelliJ IDEA 2020.1.1\lib\idea_rt.jar=51489:D:\developer_tools\IntelliJ IDEA\IntelliJ IDEA 2020.1.1\bin" -Dfile.encoding=UTF-8 -classpath D:\developer_tools\Java\jdk1.8.0_251\jre\lib\charsets.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\deploy.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\access-bridge-64.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\cldrdata.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\dnsns.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\jaccess.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\jfxrt.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\localedata.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\nashorn.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunec.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunjce_provider.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunmscapi.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\sunpkcs11.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\ext\zipfs.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\javaws.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jce.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jfr.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jfxswt.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\jsse.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\management-agent.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\plugin.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\resources.jar;D:\developer_tools\Java\jdk1.8.0_251\jre\lib\rt.jar;D:\code\code01\spark_test\spark_core\target\classes;D:\developer_tools\Scala\lib\scala-library.jar;D:\developer_tools\Scala\lib\scala-reflect.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-core_2.11\2.4.4\spark-core_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\com\thoughtworks\paranamer\paranamer\2.8\paranamer-2.8.jar;D:\developer_tools\Maven\repository\org\apache\avro\avro\1.8.2\avro-1.8.2.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-compress\1.8.1\commons-compress-1.8.1.jar;D:\developer_tools\Maven\repository\org\tukaani\xz\1.5\xz-1.5.jar;D:\developer_tools\Maven\repository\org\apache\avro\avro-mapred\1.8.2\avro-mapred-1.8.2-hadoop2.jar;D:\developer_tools\Maven\repository\org\apache\avro\avro-ipc\1.8.2\avro-ipc-1.8.2.jar;D:\developer_tools\Maven\repository\commons-codec\commons-codec\1.9\commons-codec-1.9.jar;D:\developer_tools\Maven\repository\com\twitter\chill_2.11\0.9.3\chill_2.11-0.9.3.jar;D:\developer_tools\Maven\repository\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;D:\developer_tools\Maven\repository\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;D:\developer_tools\Maven\repository\org\objenesis\objenesis\2.5.1\objenesis-2.5.1.jar;D:\developer_tools\Maven\repository\com\twitter\chill-java\0.9.3\chill-java-0.9.3.jar;D:\developer_tools\Maven\repository\org\apache\xbean\xbean-asm6-shaded\4.8\xbean-asm6-shaded-4.8.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-client\2.6.5\hadoop-client-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-common\2.6.5\hadoop-common-2.6.5.jar;D:\developer_tools\Maven\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;D:\developer_tools\Maven\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;D:\developer_tools\Maven\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;D:\developer_tools\Maven\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;D:\developer_tools\Maven\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;D:\developer_tools\Maven\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;D:\developer_tools\Maven\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;D:\developer_tools\Maven\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;D:\developer_tools\Maven\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;D:\developer_tools\Maven\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;D:\developer_tools\Maven\repository\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-auth\2.6.5\hadoop-auth-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\httpcomponents\httpclient\4.2.5\httpclient-4.2.5.jar;D:\developer_tools\Maven\repository\org\apache\httpcomponents\httpcore\4.2.4\httpcore-4.2.4.jar;D:\developer_tools\Maven\repository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;D:\developer_tools\Maven\repository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;D:\developer_tools\Maven\repository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;D:\developer_tools\Maven\repository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;D:\developer_tools\Maven\repository\org\apache\curator\curator-client\2.6.0\curator-client-2.6.0.jar;D:\developer_tools\Maven\repository\org\htrace\htrace-core\3.0.4\htrace-core-3.0.4.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-hdfs\2.6.5\hadoop-hdfs-2.6.5.jar;D:\developer_tools\Maven\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;D:\developer_tools\Maven\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;D:\developer_tools\Maven\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.6.5\hadoop-mapreduce-client-app-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.6.5\hadoop-mapreduce-client-common-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-client\2.6.5\hadoop-yarn-client-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-server-common\2.6.5\hadoop-yarn-server-common-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.6.5\hadoop-mapreduce-client-shuffle-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-api\2.6.5\hadoop-yarn-api-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.6.5\hadoop-mapreduce-client-core-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-yarn-common\2.6.5\hadoop-yarn-common-2.6.5.jar;D:\developer_tools\Maven\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;D:\developer_tools\Maven\repository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;D:\developer_tools\Maven\repository\org\codehaus\jackson\jackson-xc\1.9.13\jackson-xc-1.9.13.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.6.5\hadoop-mapreduce-client-jobclient-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\hadoop\hadoop-annotations\2.6.5\hadoop-annotations-2.6.5.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-launcher_2.11\2.4.4\spark-launcher_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-kvstore_2.11\2.4.4\spark-kvstore_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\core\jackson-core\2.6.7\jackson-core-2.6.7.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\core\jackson-annotations\2.6.7\jackson-annotations-2.6.7.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-network-common_2.11\2.4.4\spark-network-common_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-network-shuffle_2.11\2.4.4\spark-network-shuffle_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-unsafe_2.11\2.4.4\spark-unsafe_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\javax\activation\activation\1.1.1\activation-1.1.1.jar;D:\developer_tools\Maven\repository\org\apache\curator\curator-recipes\2.6.0\curator-recipes-2.6.0.jar;D:\developer_tools\Maven\repository\org\apache\curator\curator-framework\2.6.0\curator-framework-2.6.0.jar;D:\developer_tools\Maven\repository\com\google\guava\guava\16.0.1\guava-16.0.1.jar;D:\developer_tools\Maven\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;D:\developer_tools\Maven\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-lang3\3.5\commons-lang3-3.5.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;D:\developer_tools\Maven\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;D:\developer_tools\Maven\repository\org\slf4j\slf4j-api\1.7.16\slf4j-api-1.7.16.jar;D:\developer_tools\Maven\repository\org\slf4j\jul-to-slf4j\1.7.16\jul-to-slf4j-1.7.16.jar;D:\developer_tools\Maven\repository\org\slf4j\jcl-over-slf4j\1.7.16\jcl-over-slf4j-1.7.16.jar;D:\developer_tools\Maven\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;D:\developer_tools\Maven\repository\org\slf4j\slf4j-log4j12\1.7.16\slf4j-log4j12-1.7.16.jar;D:\developer_tools\Maven\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;D:\developer_tools\Maven\repository\org\xerial\snappy\snappy-java\1.1.7.3\snappy-java-1.1.7.3.jar;D:\developer_tools\Maven\repository\org\lz4\lz4-java\1.4.0\lz4-java-1.4.0.jar;D:\developer_tools\Maven\repository\com\github\luben\zstd-jni\1.3.2-2\zstd-jni-1.3.2-2.jar;D:\developer_tools\Maven\repository\org\roaringbitmap\RoaringBitmap\0.7.45\RoaringBitmap-0.7.45.jar;D:\developer_tools\Maven\repository\org\roaringbitmap\shims\0.7.45\shims-0.7.45.jar;D:\developer_tools\Maven\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;D:\developer_tools\Maven\repository\org\scala-lang\scala-library\2.11.12\scala-library-2.11.12.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-jackson_2.11\3.5.3\json4s-jackson_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-core_2.11\3.5.3\json4s-core_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-ast_2.11\3.5.3\json4s-ast_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\json4s\json4s-scalap_2.11\3.5.3\json4s-scalap_2.11-3.5.3.jar;D:\developer_tools\Maven\repository\org\scala-lang\modules\scala-xml_2.11\1.0.6\scala-xml_2.11-1.0.6.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\core\jersey-client\2.22.2\jersey-client-2.22.2.jar;D:\developer_tools\Maven\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\hk2-api\2.4.0-b34\hk2-api-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\hk2-utils\2.4.0-b34\hk2-utils-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\external\aopalliance-repackaged\2.4.0-b34\aopalliance-repackaged-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\external\javax.inject\2.4.0-b34\javax.inject-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\hk2-locator\2.4.0-b34\hk2-locator-2.4.0-b34.jar;D:\developer_tools\Maven\repository\org\javassist\javassist\3.18.1-GA\javassist-3.18.1-GA.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\core\jersey-common\2.22.2\jersey-common-2.22.2.jar;D:\developer_tools\Maven\repository\javax\annotation\javax.annotation-api\1.2\javax.annotation-api-1.2.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\bundles\repackaged\jersey-guava\2.22.2\jersey-guava-2.22.2.jar;D:\developer_tools\Maven\repository\org\glassfish\hk2\osgi-resource-locator\1.0.1\osgi-resource-locator-1.0.1.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\core\jersey-server\2.22.2\jersey-server-2.22.2.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\media\jersey-media-jaxb\2.22.2\jersey-media-jaxb-2.22.2.jar;D:\developer_tools\Maven\repository\javax\validation\validation-api\1.1.0.Final\validation-api-1.1.0.Final.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\containers\jersey-container-servlet\2.22.2\jersey-container-servlet-2.22.2.jar;D:\developer_tools\Maven\repository\org\glassfish\jersey\containers\jersey-container-servlet-core\2.22.2\jersey-container-servlet-core-2.22.2.jar;D:\developer_tools\Maven\repository\io\netty\netty-all\4.1.17.Final\netty-all-4.1.17.Final.jar;D:\developer_tools\Maven\repository\io\netty\netty\3.9.9.Final\netty-3.9.9.Final.jar;D:\developer_tools\Maven\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-core\3.1.5\metrics-core-3.1.5.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-jvm\3.1.5\metrics-jvm-3.1.5.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-json\3.1.5\metrics-json-3.1.5.jar;D:\developer_tools\Maven\repository\io\dropwizard\metrics\metrics-graphite\3.1.5\metrics-graphite-3.1.5.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\core\jackson-databind\2.6.7.1\jackson-databind-2.6.7.1.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\module\jackson-module-scala_2.11\2.6.7.1\jackson-module-scala_2.11-2.6.7.1.jar;D:\developer_tools\Maven\repository\org\scala-lang\scala-reflect\2.11.8\scala-reflect-2.11.8.jar;D:\developer_tools\Maven\repository\com\fasterxml\jackson\module\jackson-module-paranamer\2.7.9\jackson-module-paranamer-2.7.9.jar;D:\developer_tools\Maven\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;D:\developer_tools\Maven\repository\oro\oro\2.0.8\oro-2.0.8.jar;D:\developer_tools\Maven\repository\net\razorvine\pyrolite\4.13\pyrolite-4.13.jar;D:\developer_tools\Maven\repository\net\sf\py4j\py4j\0.10.7\py4j-0.10.7.jar;D:\developer_tools\Maven\repository\org\apache\spark\spark-tags_2.11\2.4.4\spark-tags_2.11-2.4.4.jar;D:\developer_tools\Maven\repository\org\apache\commons\commons-crypto\1.0.0\commons-crypto-1.0.0.jar;D:\developer_tools\Maven\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar com.sibat.applications.wordcount
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/13 17:44:17 INFO SparkContext: Running Spark version 2.4.4
21/10/13 17:44:17 INFO SparkContext: Submitted application: WordCount
21/10/13 17:44:17 INFO SecurityManager: Changing view acls to: muzili
21/10/13 17:44:17 INFO SecurityManager: Changing modify acls to: muzili
21/10/13 17:44:17 INFO SecurityManager: Changing view acls groups to:
21/10/13 17:44:17 INFO SecurityManager: Changing modify acls groups to:
21/10/13 17:44:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(muzili); groups with view permissions: Set(); users with modify permissions: Set(muzili); groups with modify permissions: Set()
21/10/13 17:44:18 INFO Utils: Successfully started service 'sparkDriver' on port 51527.
21/10/13 17:44:19 INFO SparkEnv: Registering MapOutputTracker
21/10/13 17:44:19 INFO SparkEnv: Registering BlockManagerMaster
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/10/13 17:44:19 INFO DiskBlockManager: Created local directory at C:\Users\muzili\AppData\Local\Temp\blockmgr-a6f2f260-7970-4a16-82b5-93d659f2c49f
21/10/13 17:44:19 INFO MemoryStore: MemoryStore started with capacity 1975.8 MB
21/10/13 17:44:19 INFO SparkEnv: Registering OutputCommitCoordinator
21/10/13 17:44:19 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/10/13 17:44:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://LAPTOP-R0NFMTAH:4040
21/10/13 17:44:19 INFO Executor: Starting executor ID driver on host localhost
21/10/13 17:44:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51568.
21/10/13 17:44:19 INFO NettyBlockTransferService: Server created on LAPTOP-R0NFMTAH:51568
21/10/13 17:44:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/10/13 17:44:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Registering block manager LAPTOP-R0NFMTAH:51568 with 1975.8 MB RAM, BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 1975.6 MB)
21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 1975.6 MB)
21/10/13 17:44:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 20.4 KB, free: 1975.8 MB)
21/10/13 17:44:19 INFO SparkContext: Created broadcast 0 from textFile at wordcount.scala:14
21/10/13 17:44:19 INFO FileInputFormat: Total input paths to process : 1
21/10/13 17:44:19 INFO SparkContext: Starting job: collect at wordcount.scala:22
21/10/13 17:44:20 INFO DAGScheduler: Registering RDD 3 (map at wordcount.scala:18)
21/10/13 17:44:20 INFO DAGScheduler: Got job 0 (collect at wordcount.scala:22) with 2 output partitions
21/10/13 17:44:20 INFO DAGScheduler: Final stage: ResultStage 1 (collect at wordcount.scala:22)
21/10/13 17:44:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
21/10/13 17:44:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
21/10/13 17:44:20 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18), which has no missing parents
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2.9 KB, free: 1975.8 MB)
21/10/13 17:44:20 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18) (first 15 tasks are for partitions Vector(0, 1))
21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7887 bytes)
21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7887 bytes)
21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:0+25
21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:25+26
21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1157 bytes result sent to driver
21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1157 bytes result sent to driver
21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 165 ms on localhost (executor driver) (1/2)
21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 181 ms on localhost (executor driver) (2/2)
21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
21/10/13 17:44:20 INFO DAGScheduler: ShuffleMapStage 0 (map at wordcount.scala:18) finished in 0.256 s
21/10/13 17:44:20 INFO DAGScheduler: looking for newly runnable stages
21/10/13 17:44:20 INFO DAGScheduler: running: Set()
21/10/13 17:44:20 INFO DAGScheduler: waiting: Set(ResultStage 1)
21/10/13 17:44:20 INFO DAGScheduler: failed: Set()
21/10/13 17:44:20 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20), which has no missing parents
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2029.0 B, free 1975.6 MB)
21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2029.0 B, free: 1975.8 MB)
21/10/13 17:44:20 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161
21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20) (first 15 tasks are for partitions Vector(0, 1))
21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 7662 bytes)
21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1284 bytes result sent to driver
21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1261 bytes result sent to driver
21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 60 ms on localhost (executor driver) (1/2)
21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 58 ms on localhost (executor driver) (2/2)
21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/10/13 17:44:20 INFO DAGScheduler: ResultStage 1 (collect at wordcount.scala:22) finished in 0.070 s
21/10/13 17:44:20 INFO DAGScheduler: Job 0 finished: collect at wordcount.scala:22, took 0.561218 s
21/10/13 17:44:20 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-R0NFMTAH:4040
(scala,1)
(flink,1)
(hello,4)
(spark,1)
(hadoop,1)
21/10/13 17:44:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/10/13 17:44:20 INFO MemoryStore: MemoryStore cleared
21/10/13 17:44:20 INFO BlockManager: BlockManager stopped
21/10/13 17:44:20 INFO BlockManagerMaster: BlockManagerMaster stopped
21/10/13 17:44:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/10/13 17:44:20 INFO SparkContext: Successfully stopped SparkContext
21/10/13 17:44:20 INFO ShutdownHookManager: Shutdown hook called
21/10/13 17:44:20 INFO ShutdownHookManager: Deleting directory C:\Users\muzili\AppData\Local\Temp\spark-74be267d-556a-40a4-9253-55fc0a910290