下载spark 包含hadoop和scala的内置包
https://mirrors.aliyun.com/apache/spark/spark-3.2.4/spark-3.2.4-bin-hadoop3.2-scala2.13.tgz
选择自己需要的版本
https://mirrors.aliyun.com/apache/spark/spark-3.2.4/
下载后解压至linux /usr/local/spark中
使用单机集群模式
启动
/usr/local/spark/sbin/start-all.sh
查看日志
cat /usr/local/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out
可以看到webUI是8081端口、master端口在7077
关闭
/usr/local/spark/sbin/stop-all.sh
如果出现了需要密码登录的话,是因为开启脚本的时候可能需要跨机器启动服务,所以需要ssh的权限
//当前的用户是root用户
ssh-keygen -t rsa
//一直回车
cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
master端口
默认在127.0.0.1 想要暴露到外部需要修改conf/spark-env.sh.template->conf/spark-env.sh
export SPARK_MASTER_HOST=0.0.0.0
然后重启服务就可以提供给外部连接
maven项目
Sprintboot POM
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.8</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.12</version>
</dependency>
public class Test {
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf()
.setAppName("app")
.setMaster("local[*]");
SparkSession sparkSession = SparkSession
.builder()
.sparkContext(new JavaSparkContext(sparkConf).sc())
.getOrCreate();
sparkSession.read().text("file:///D:/Java/IdeaWorkplace/seek-employeement/application.properties").show();
sparkSession.close();
}
}