初始化
export SPARK_HOME=/XXXX
spark程序打包依赖分离
mvn dependency:copy-dependencies -DoutputDirectory=libs
将libs压缩为libs.zip上传到服务器或hdfs的/spark-yarn/jars
local模式配置依赖jars
mkdir -p /tmp/spark/jars
cd /tmp/spark/jars
#上传libs.zip 命令不存在yum install -y lrzsz
rz
unzip libs.zip
修改spark-defaults.conf配置,添加以下内容
#spark local jars
spark.executor.extraClassPath=/tmp/spark/jars/libs/*
spark.driver.extraClassPath=/tmp/spark/jars/libs/*
local模式执行测试
${SPARK_HOME}/bin/spark-submit --master local --class com.xxx.spark.dw.dwd.DWDDebugCommonModel --driver-cores 1 --driver-memory 2G --num-executors 2 --executor-cores 2 --executor-memory 4G --name 测试 /opt/software/pkgs/xxx.jar
Client或Cluster模式配置依赖jars
快速测试client模式,将hdfs路径改为local:/tmp/spark/jars/libs
上传jars到hdfs
hdfs dfs -put /tmp/spark/jars/libs/* /spark-yarn/jars/
修改spark-defaults.conf文件将spark.yarn.jars的值改为
spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/jars/*,local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/hive/*,local:/tmp/spark/jars/libs/*
说明:
本地路径格式 local:/xx
hdfs路径格式 hdfs://master:8020/xx
YRAN提交任务
${SPARK_HOME}/bin/spark-submit --master yarn --deploy-mode client|cluster --class com.iflytek.spark.dw.dwd.DWDDebugCommonModel --driver-cores 1 --driver-memory 2G --num-executors 2 --executor-cores 2 --executor-memory 4G --name 测试 /opt/software/pkgs/dataoperation.jar
client模式
driver在客户端(调试)
cluster模式
driver在集群(生产)
问题说明
如果遇到local,client,cluster提交部分失败,报错java.lang.NoSuchMethodError
将spark-defaults.conf中的注释掉,只保留spark.yarn.jars,确保没有把依赖复制到spark/jars等目录
#spark local jars
#spark.executor.extraClassPath=/tmp/spark/jars/libs/*
#spark.driver.extraClassPath=/tmp/spark/jars/libs/*
Spark或Yarn中文乱码