1. 环境准备及版本介绍
1.Linux系统版本
CentOS release 6.8 (Final)Linux version 2.6.32-642.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 10 17:27:01 UTC 2016镜像版本:CentOS-6.8-x86_64-minimal.iso
2.JDK版本
java version "1.8.0_131"Java(TM) SE Runtime Environment (build 1.8.0_131-b11)Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)环境变量配置:vim /etc/profile 添加以下文件内容 然后source /etc/profileexport JAVA_HOME=/home/software/jdk/jdk1.8.0_131export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/rt.jar
3.Hadoop和Spark版本
hadoop-2.7.7spark-2.4.0-bin-hadoop2.7
2. Spark Standalone Mode HA
1.节点安排
备注/etc/hosts配置:192.168.1.211 z1192.168.1.212 z2192.168.1.213 z3192.168.1.214 z4
2.开始安装
- spark.env.sh配置
#export SPARK_MASTER_HOST=z1export SPARK_MASTER_PORT=7077export JAVA_HOME=/home/software/jdk/jdk1.8.0_131#高可用export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=z1:2181,z2:2181,z3:2181 -Dspark.deploy.zookeeper.dir=/spark"#export SPARK_WORKER_MEMORY=2g#export SPARK_EXECUTOR_MEMORY=2g#export SPARK_DRIVER_MEMORY=2g#export SPARK_WORKER_CORES=1
- slaves配置
z2z3z4
- 配置History Server(很重要)
spark.eventLog.enabled true spark.eventLog.dir hdfs://bigdata/user/spark/historyLog spark.history.fs.logDirectory hdfs://bigdata/user/spark/historyLog
3.启动
第一步:启动三个zookeeper 在其bin目录下执行./zkServer.sh start第二步:在z1主机spark的sbin目录下 执行 ./start-all.sh第三步:选择一个节点(z2主机)启动一个master作为standby节点 执行./start-master.sh第四步:浏览器访问 http://z1:8080 打开spark的web ui监控页面
4.提交任务
bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://z1:7077 --deploy-mode client examples/jars/spark-examples_2.11-2.4.0.jar 10000集群方式:bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://z1:7077 --deploy-mode cluster examples/jars/spark-examples_2.11-2.4.0.jar 10000如果是集群方式提交确保jar包在所有节点上否则会出错建议将jar包上传至hdfsbin/spark-submit --class cn.com.spark.GroupTest --master spark://z1:7077 --deploy-mode cluster hdfs://z2:8020/spark/examples/simple-spark-master-1.0-SNAPSHOT-jar-with-dependencies.jar
5. 配置项
配置详解参考官方网站:版本可能会有所差异http://spark.apache.org/docs/latest/configuration.html
6.代码当中可以自己修改配置项
In some cases, you may want to avoid hard-coding certain configurations in a SparkConf. For instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties. val conf = new SparkConf().set("spark.hadoop.abc.def