一 配置JDK8
- 详细参考:
http://jingyan.baidu.com/article/ab69b270c01a4d2ca7189f8c.html
二 配置Maven
- 安装maven
http://jingyan.baidu.com/article/d8072ac45d3660ec94cefd51.html - Maven在Eclipse中的配置
http://jingyan.baidu.com/article/db55b609a994114ba20a2f56.html
三 配置Hadoop
- 不需要实际安装Hadoop,下载插件包即可。
windows64位平台的hadoop2.6插件包 - 解压包到目标文件夹,目标文件夹即Hadoop Home.
四 新建Maven工程
- 新建一个maven工程
http://jingyan.baidu.com/article/375c8e19b5014c25f2a22912.html - 引入Spark 包,在POM.XML 文件中dependencies元素下添加如下元素。
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.0</version>
</dependency>
五 测试代码
System.setProperty("hadoop.home.dir", "E:/drawsky/bin");
SparkSession spark = SparkSession.builder()
.appName("test")
.master("local[*]")
.config("hadoop.home.dir", "E:/drawsky/bin")
.config("spark.sql.warehouse.dir","E/drawsky/SparkTest/spark-warehouse")
.getOrCreate();
private static void countWords(SparkSession spark) {
JavaRDD<String> st = spark.sparkContext()
.textFile("G:/elasticsearch-5.3.2/LICENSE.txt", 1)
.toJavaRDD();
JavaPairRDD<String, Integer> out =st
.flatMap(
line ->Arrays.asList(line.split(" ")).iterator()
)
.groupBy(e -> e)
.mapValues(x -> {
Iterator<String> it = x.iterator();
int i = 0;
while (it.hasNext()) {
it.next();
i++;
}
return i;
});
Map<String, Integer> map = out.collectAsMap();
System.out.println(map.toString());
}