安装spark3.0.0
(一)打开安装包所在地
[root@hurys22 conf]# cd /opt/install/
[root@hurys22 install]# ls
(二)解压安装包
[root@hurys22 install]# tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/soft
[root@hurys22 install]# cd /opt/soft
[root@hurys22 soft]# ls
hadoop313 hbase205 hive312 jdk180 scala211 spark-3.0.0-bin-hadoop3.2 sqoop146 zepplin090 zookeeper357
(三)重命名
[root@hurys22 soft]# mv spark-3.0.0-bin-hadoop3.2 spark300
[root@hurys22 soft]# ls
hadoop313 hbase205 hive312 jdk180 scala211 spark300 sqoop146 zepplin090 zookeeper357
(四)查看安装路径
[root@hurys22 soft]# cd /opt/soft/spark300/
[root@hurys22 spark300]# pwd
/opt/soft/spark300
[root@hurys22 spark300]# ls
bin conf data examples jars kubernetes LICENSE licenses NOTICE python R README.md RELEASE sbin yarn
(五)修改配置环境
[root@hurys22 spark300]# vi /etc/profile
#spark
export SPARK_HOME=/opt/soft/spark300
export PATH=$PATH:$SPARK_HOME/bin
(六)source一下
[root@hurys22 spark300]# source /etc/profile
(七)创建文件夹directory
创建文件夹directory(需要开启Hadoop)
[root@hurys22 hadoop]# start-all.sh
[root@hurys22 hadoop]# hdfs dfs -mkdir /directory
(八)修改配置文件
[root@hurys22 spark300]# cd ./conf/
[root@hurys22 conf]# ls
fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template
log4j.properties.template slaves.template spark-env.sh.template
文件1 spark-env.sh
第一步,cp文件并重命名
[root@hurys22 conf]# cp spark-env.sh.template spark-env.sh
[root@hurys22 conf]# ls
fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template spark-env.sh.template
log4j.properties.template slaves.template spark-env.sh
第二步,JDK安装路径
[root@hurys22 conf]# echo $JAVA_HOME
/usr/local/java
第三步,Hadoop安装路径
[root@hurys22 conf]# cd /opt/soft/hadoop313/etc/hadoop/
[root@hurys22 hadoop]# pwd
/opt/soft/hadoop313/etc/hadoop
第四步,打开文件
[root@hurys22 conf]# vi ./spark-env.sh
export JAVA_HOME=/usr/local/java
YARN_CONF_DIR=/opt/soft/hadoop313/etc/hadoop
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://hurys22:8020/directory
-Dspark.history.retainedApplications=30"
SPARK_MASTER_HOST=hurys22
SPARK_MASTER_PORT=7077
文件2 spark-defaults.conf
[root@hurys22 conf]# cp spark-defaults.conf.template spark-defaults.conf
[root@hurys22 conf]# vi spark-defaults.conf
22 # spark.master spark://master:7077
23 # spark.master yarn
24 spark.eventLog.enabled true
25 spark.eventLog.dir hdfs://hurys22:8020/directory
26 # spark.serializer org.apache.spark.serializer.KryoSerializer
27 # spark.driver.memory 5g
28 # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
29
30
31 spark.yarn.historyServer.address=hurys22:18080
32 spark.history.ui.port=18080
文件3 slaves
[root@hurys22 conf]# cp slaves.template slaves
[root@hurys22 conf]# ls
fairscheduler.xml.template metrics.properties.template slaves.template spark-env.sh
log4j.properties.template slaves spark-defaults.conf.template spark-env.sh.template
[root@hurys22 conf]# vi slaves
18 # A Spark Worker will be started on each of the machines listed below.
19 hurys22
在Hadoop配置文件里添加
文件4 yarn-site.xml(之前Hadoop安装时已经添加)
[root@hurys24 hadoop]# vi yarn-site.xml
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是 true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是 true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
[root@hurys22 conf]# spark-shell
2022-05-11 17:03:02,238 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hurys22:4040
Spark context available as 'sc' (master = local[*], app id = local-1652259789674).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.0
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val a=Array(1,2,3,4)
a: Array[Int] = Array(1, 2, 3, 4)
scala> a(1)
res0: Int = 2
登录页面http://192.168.0.22:4040/查看
一、Local 模式(不需要启动Hadoop)
(一)打开安装包所在地
[root@hurys22 conf]# cd /opt/install/
[root@hurys22 install]# ls
(二)解压安装包
[root@hurys22 install]# tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/soft
[root@hurys22 install]# cd /opt/soft
[root@hurys22 soft]# ls
hadoop313 hbase205 hive312 jdk180 scala211 spark-3.0.0-bin-hadoop3.2 sqoop146 zepplin090 zookeeper357
[root@hurys22 install]# cd /opt/soft
[root@hurys22 soft]# ls
hadoop313 hbase205 hive312 jdk180 scala211 spark-3.0.0-bin-hadoop3.2 sqoop146 zepplin090 zookeeper357
(三)重命名
[root@hurys22 soft]# mv spark-3.0.0-bin-hadoop3.2 spark300
[root@hurys22 soft]# ls
hadoop313 hbase205 hive312 jdk180 scala211 spark300 sqoop146 zepplin090 zookeeper357
(四)查看安装路径
[root@hurys22 soft]# cd /opt/soft/spark300/
[root@hurys22 spark300]# pwd
/opt/soft/spark300
[root@hurys22 spark300]# ls
bin conf data examples jars kubernetes LICENSE licenses NOTICE python R README.md RELEASE sbin yarn
(五)修改配置环境
[root@hurys22 spark300]# vi /etc/profile
#spark
export SPARK_HOME=/opt/soft/spark300
export PATH=$PATH:$SPARK_HOME/bin
(六)source一下
[root@hurys22 spark300]# source /etc/profile
(七)本地登录spark
[root@hurys22 spark300]# cd ./bin/
[root@hurys22 bin]# spark-shell
退出后提交应用
[root@hurys22 bin]# spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[2] \
/opt/soft/spark300/examples/jars/spark-examples_2.12-3.0.0.jar \
10
1) --class 表示要执行程序的主类,此处可以更换为咱们自己写的应用程序
2) --master local[2] 部署模式,默认为本地模式,数字表示分配的虚拟 CPU 核数量
3) spark-examples_2.12-3.0.0.jar 运行的应用类所在的 jar 包,实际使用时,可以设定为咱
们自己打的 jar 包
4) 数字 10 表示程序的入口参数,用于设定当前应用的任务数量
二、yarn模式安装(在本地模式安装的基础上修改配置文件 之前Hadoop安装时已经添加)
(一)修改Hadoop的配置文件 yarn-site.xml
[root@hurys22 hadoop]# vi yarn-site.xml
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是 true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是 true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
(二)启动Hadoop
[root@hurys22 hadoop]# start-all.sh
[root@hurys22 hadoop]# mr-jobhistory-daemon.sh start historyserver
(三)提交应用
[root@hurys22 bin]# spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
/opt/soft/spark300/examples/jars/spark-examples_2.12-3.0.0.jar \
10
tracking URL: http://hurys22:8088/proxy/application_1652690106178_0001/
登录网页 http://192.168.0.22:8088/cluster
(四)配置历史服务器
首先,建文件夹directory
[root@hurys22 hadoop]# hadoop fs -mkdir /directory
其次,修改配置文件 spark-defaults.conf
[root@hurys22 conf]# vi spark-defaults.conf
22 # spark.master spark://master:7077
23 # spark.master yarn
24 spark.eventLog.enabled true
25 spark.eventLog.dir hdfs://hurys22:8020/directory
26 # spark.serializer org.apache.spark.serializer.KryoSerializer
27 # spark.driver.memory 5g
28 # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
29
30
31 spark.yarn.historyServer.address=hurys22:18080
32 spark.history.ui.port=18080
然后,修改配置文件 spark-env.sh
[root@hurys22 conf]# vi spark-env.sh
export JAVA_HOME=/opt/soft/jdk180
YARN_CONF_DIR=/opt/soft/hadoop313/etc/hadoop
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://hurys22:8020/directory
-Dspark.history.retainedApplications=30"
最后,开启历史服务器
[root@hurys22 conf]# cd ..
[root@hurys22 spark300]# sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/soft/spark300/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out
(五)jps查看一下
[root@hurys22 bin]# jps
6080 NameNode
6800 NodeManager
6674 ResourceManager
6388 SecondaryNameNode
7893 JobHistoryServer
8262 HistoryServer
8697 Jps
6203 DataNode
(六)再次执行提交应用
[root@hurys22 bin]# spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
/opt/soft/spark300/examples/jars/spark-examples_2.12-3.0.0.jar \
10
登录历史页面 http://192.168.0.22:18080/
三、Standalone 模式(在本地模式安装的基础上修改配置文件)
(一)修改文件 slaves
[root@hurys22 conf]# vi slaves
hurys22
(二) 修改配置文件 spark-env.sh
[root@hurys22 conf]# vi spark-env.sh
export JAVA_HOME=/opt/soft/jdk180
SPARK_MASTER_HOST=hurys22
SPARK_MASTER_PORT=7077
(三)启动master
[root@hurys22 spark300]# sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/soft/spark300/logs/spark-root-org.apache.spark.deploy.master.Master-1-hurys22.out
hurys22: starting org.apache.spark.deploy.worker.Worker, logging to /opt/soft/spark300/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hurys22.out
(四)jps查看一下
[root@hurys22 spark300]# jps
3383 Jps
3325 Worker
3246 Master
(五)启动Hadoop
[root@hurys22 hadoop]# start-all.sh
[root@hurys22 hadoop]# mr-jobhistory-daemon.sh start historyserver
(六)配置历史服务器
首先,建文件夹directory
[root@hurys22 hadoop]# hadoop fs -mkdir /directory
其次,修改配置文件 spark-defaults.conf
[root@hurys22 conf]# vi spark-defaults.conf
22 # spark.master spark://master:7077
23 # spark.master yarn
24 spark.eventLog.enabled true
25 spark.eventLog.dir hdfs://hurys22:8020/directory
26 # spark.serializer org.apache.spark.serializer.KryoSerializer
27 # spark.driver.memory 5g
28 # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
然后,修改配置文件 spark-env.sh
[root@hurys22 conf]# vi spark-env.sh
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://hurys22:8020/directory
-Dspark.history.retainedApplications=30"
⚫ 参数 1 含义:WEB UI 访问的端口号为 18080
⚫ 参数 2 含义:指定历史服务器日志存储路径
⚫ 参数 3 含义:指定保存 Application 历史记录的个数,如果超过这个值,旧的应用程序
信息将被删除,这个是内存中的应用数,而不是页面上显示的应用数。
最后,开启历史服务器
[root@hurys22 conf]# cd ..
[root@hurys22 spark300]# sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/soft/spark300/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out
查看 Master 资源监控 Web UI 界面 http://192.168.0.22:8080/
(七)再次执行提交应用
[root@hurys22 bin]# spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hurys22:7077 \
/opt/soft/spark300/examples/jars/spark-examples_2.12-3.0.0.jar \
10
查看历史服务 http://192.168.0.22:18080/
(八)查验spark记录
[root@hurys22 spark300]# cd ./logs/
[root@hurys22 logs]# ls
spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out
spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out.1
spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out.2
spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out.3
spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out.4
spark-root-org.apache.spark.deploy.history.HistoryServer-1-hurys22.out.5
spark-root-org.apache.spark.deploy.master.Master-1-hurys22.out
spark-root-org.apache.spark.deploy.worker.Worker-1-hurys22.out
[root@hurys22 logs]# cat spark-root-org.apache.spark.deploy.worker.Worker-1-hurys22.out