1.下载安装包
http://spark.apache.org/downloads.html
2.解压缩
tar -zxvf spark-3.0.1-bin-hadoop2.7.tgz
3.配置master
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
在文件中添加
export SPARK_MASTER_HOST=192.168.5.150
4.配置slaves
slaves现在就主机一台,默认localhost
5.配置jdk环境
cd sbin
vi spark-config.sh
在文件中添加
export JAVA_HOME=/home/jdk1.8
6.启动
./start-all.sh
7.访问spark web界面
http://192.168.1.11:8080
8.尝试测试
pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
测试代码:
package com.zte.mars.model;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class SparkTest
{
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "D:\\winutils-master\\winutils-master\\hadoop-2.7.6");
SparkSession spark = SparkSession
.builder()
.master("spark://192.168.1.11:7077")
.appName("Java Spark basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate();
Dataset<Row> df = spark.read().json("resources/people.json");
df.show();
spark.stop();
}
}
注意:
1.如果报错Could not locate executable null\bin\winutils.exe in the Hadoop binaries
可在https://github.com/cdarlint/winutils下载相应的版本在代码中添加环境变量
System.setProperty(“hadoop.home.dir”, “D:\winutils-master\winutils-master\hadoop-2.7.6”);
2.如果报错Can only call getServletHandlers on a running MetricsSystem,说明pom中的spark版本和实际搭建的spark版本不匹配