不知不觉正月15过去了了,本来是和家人团圆的日子,像我们这种上班族上哪团圆去。(蓝瘦香菇)
蓝瘦归蓝瘦,日子还是要过的
搭建Spark的全分布式环境
HostName | NameNode | DataNode | JournalNode | Zookeeper | Master(新增) | Worker(新增) |
---|---|---|---|---|---|---|
wpixel01 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
wpixel02 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
wpixel03 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
wpixel04 | ✔️ | ✔️ | ✔️ |
主节点:10.211.55.111|112|113
1、解压到当前目录(可自定义目录)
[root@wpixel01 www]# tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
2、修改配置文件spark-env.sh
#spark-env.sh文件在conf目录下
[root@wpixel01 www]# cd spark-2.2.0-bin-hadoop2.7/conf/
[root@wpixel01 conf]# ll
total 32
-rw-r--r--. 1 500 500 996 Jul 1 2017 docker.properties.template
-rw-r--r--. 1 500 500 1105 Jul 1 2017 fairscheduler.xml.template
-rw-r--r--. 1 500 500 2025 Jul 1 2017 log4j.properties.template
-rw-r--r--. 1 500 500 7313 Jul 1 2017 metrics.properties.template
-rw-r--r--. 1 500 500 865 Jul 1 2017 slaves.template
-rw-r--r--. 1 500 500 1292 Jul 1 2017 spark-defaults.conf.template
-rwxr-xr-x. 1 500 500 3699 Jul 1 2017 spark-env.sh.template
修改内容
#先将spark-env.sh.template文件更改为spark-env.sh
[root@wpixel01 conf]# mv spark-env.sh.template spark-env.sh
实现sparkHA有两种方式
一、基于文件系统的单点恢复
(*) 单点:还是只有一个主节点
(*) 单点恢复指主节点宕机后,依然可以手动恢复到之前的状态(比如:worker的注册信息和application的信息)
(*) 只能用于开发和测试
(*) 基本思想:指定一个恢复目录,Spark在正常运行期间,会把状态恢复的信息写入该目录
(*) 核心参数:
spark.deploy.recoveryMode:设置恢复的模式,取值:FILESYSTEM、ZOOOKEEPER
spark.deploy.recoverDirectory:指定恢复目录
配置方式
#开始修改
[root@wpixel01 conf]# vi spark-env.sh
#添加以下参数
export JAVA_HOME=/home/www/jdk1.8.0_101
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/home/www/spark-2.2.0-bin-hadoop2.7/recovery"
二、基于ZooKeeper的Standby Masters
(*) 核心参数:
spark.deploy.recoveryMode: 设置恢复的模式,取值:FILESYSTEM、ZOOKEEPER
spark.deploy.zookeeper.url: 配置ZK的地址
spark.deploy.zookeeper.dir: 指定spark信息保存在ZK中的目录
配置方式
[root@wpixel01 conf]# vi spark-env.sh
#添加以下参数
export JAVA_HOME=/home/www/jdk1.8.0_101
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=wpixel01:2181,wpixel02:2181,wpixel03:2181 -Dspark.deploy.zookeeper.dir=/sparkHA"
3、修改配置文件:slave
[root@wpixel01 conf]# mv slaves.template slaves
[root@wpixel01 conf]# vi slaves
#添加从节点以回车分隔
wpixel02
wpixel03
wpixel04
4、从节点:10.211.55.112|113|114
将配置好的spark复制到从节点上
[root@wpixel01 www]# scp -r spark-2.2.0-bin-hadoop2.7/ root@wpixel02:/home/www/
[root@wpixel01 www]# scp -r spark-2.2.0-bin-hadoop2.7/ root@wpixel03:/home/www/
[root@wpixel01 www]# scp -r spark-2.2.0-bin-hadoop2.7/ root@wpixel04:/home/www/
5、启动
一、先启动Zookeeper集群
[root@wpixel01 www]# zkServer.sh start
[root@wpixel02 www]# zkServer.sh start
[root@wpixel03 www]# zkServer.sh start
二、启动spark
#在主节点上启动集群
[root@wpixel01 sbin]# ./start-all.sh
通过jsp查看
#-----------------------------节点1启动了Master节点
[root@wpixel01 sbin]# jps
2581 Master
2539 QuorumPeerMain
2654 Jps
#-----------------------------节点2启动Worker节点
[root@wpixel02 sbin]# jps
2466 Worker
2594 Jps
2419 QuorumPeerMain
#-----------------------------节点3启动Worker节点
[root@wpixel03 sbin]# jps
2354 Worker
2313 QuorumPeerMain
2477 Jps
#-----------------------------节点4启动Worker节点
[root@wpixel04 ~]# jps
2309 Jps
2264 Worker
三、HA启动
启动多台Master节点实现Spark高可用性
#在wpixel02和wpixel03上启动一个master实例
[root@wpixel02 sbin]# ./start-master.sh
[root@wpixel03 sbin]# ./start-master.sh
四、通过web页面查看集群状态
测试HA
现在简单粗暴的杀掉Master节点
[root@wpixel01 sbin]# jps
2819 Jps
2539 QuorumPeerMain
2751 Master
[root@wpixel01 sbin]# kill -9 2751
现在wpixel01:8080已经访问不了,并且主节点由wpixel02掌控
6、进入spark-shell命令
一、本地模式:spark-shell
日志:(master = local[*],
[root@wpixel01 spark-2.2.0-bin-hadoop2.7]# ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/03 15:10:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/03 15:10:43 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/03/03 15:10:43 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/03/03 15:10:44 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.211.55.111:4040
Spark context available as 'sc' (master = local[*], app id = local-1520061031877).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
二、集群模式:spark-shell –master spark://wpixel01:7077 (master前面是两个-)
日志:(master = spark://wpixel02:7077
[root@wpixel02 spark-2.2.0-bin-hadoop2.7]# bin/spark-shell --master spark://wpixel02:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/03 15:30:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/03 15:31:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/03/03 15:31:23 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/03/03 15:31:24 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.211.55.112:4040
Spark context available as 'sc' (master = spark://wpixel02:7077, app id = app-20180303153032-0000).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala>