Spark集群搭建
安装jdk
解压安装spark
修改配置文件(单机测试无需配置高可用集群)
cd spark-2.3.1/conf
cp slaves.template slaves
vi slaves
node02
 node03
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
export SPARK_MASTER_HOST=node01
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=3g
启动
cd /spark-2.3.1/sbin
./start-all.sh
Spark WebUI
http://ip:8080
Spark任务提交(以SparkPI为例)
基于standalone-client提交任务
./spark-submit --master spark://ip:7077 --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
./spark-submit --master spark://ip:7077 --deploy-mode client --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
基于standalone-cluster提交任务
./spark-submit --master spark://ip:7077 --deploy-mode cluster --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
基于yarn提交任务
关闭虚拟内存检查
vim /opt/hadoop-2.6.5/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
加上之后spark就可以找到yarn,需要安装hadoop
基于yarn-client
./spark-submit --master yarn --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
./spark-submit --master yarn --deploy-mode client --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
./spark-submit --master yarn-client --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
基于yarn-cluster
./spark-submit --master yarn-cluster --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
./spark-submit --master yarn --deploy-mode cluster --class ‘org.apache.spark.examples.SparkPi’ …/examples/jars/spark-examples_2.11-2.3.1.jar 100
以yarn-client为例分析任务提交流程
Spark架构: https://blog.csdn.net/happiless/article/details/107305373