写在开头
刚开始接触spark的时候,你肯定得看点scala的东西,刚开始接触scala的时候,我觉得scala真特么好使啊。wordcount是spark的入门demo,就是说,你看spark的并行处理能力真牛逼啊,竟然能统计出这个文件中所有出现单词的个数呢。哇塞,用scala写了四行代码就搞定了啊,六六六,继续看吧!
下面先说说环境搭建要不,网上帖子很多,坑也很多。在此分享下,作者在搭建spark环境的几个小贴士:
Spark环境搭建
- CentOS7
- hadoop-2.7.5.tar.gz
- spark-2.2.0-bin-hadoop2.7.tgz
Hadoop配置安装、Spark环境搭建
请参阅笔记:http://note.youdao.com/noteshare?id=c5ade4f303edbcf73c870abb3baf6c35&sub=207F5266857D47A18403CC261A1A792C
Spark项目开发入门之WordCounter
环境介绍
- IDEA MAVEN工程
- Spark2.2.0
- local模式
pom.xml文件配置
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>spark-wc-maven</groupId>
<artifactId>wordCounter</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.12</version>
</dependency>
<dependency>
<groupId></