提交oozie需要准备2个配置:
- job.properties
- workflow.xml
# 提交oozie,需指定配置文件job.properties,此文件是在本地路径
oozie job -oozie http://node1:11000/oozie -config /tmp/spark-oozie/job.properties -run
# /tmp/spark-oozie/job.properties
nameNode=hdfs://nameservice1
jobTracker=yarnrm
queueName=default
examplesRoot=examples
# 在job.properties中,指定 workflw.xml 在hdfs中的路径
oozie.wf.application.path=${nameNode}/user/${user.name}/oozie/apps/spark-workflow.xml
outputDir=map-reduce
<workflow-app xmlns='uri:oozie:workflow:0.5' name='angora-wf'>
<start to='angora' />
<action name='angora'>
<spark xmlns='uri:oozie:spark-action:0.1'>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>yarn</master>
<mode>cluster</mode>
<name>xxx</name>
<class>com.bynear.main.MainEntranceDriver</class>
<jar>/user/root/oozie/apps/angora_new.jar</jar>
<spark-opts>
--driver-memory 2G
--executor-memory 6G
--num-executors 20
--executor-cores 2
--conf spark.yarn.jar=/user/root/oozie/apps/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar
--conf spark.default.parallelism=2000
--conf spark.rpc.askTimeout=300
--conf spark.rpc.lookupTimeout=300
--conf spark.network.timeout=300
--conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseConcMarkSweepGC -Xms7000m -Xmx7000m"
</spark-opts>
<arg>angora</arg>
<arg>/user/bailin/conf/angora_s1mme_mro_join_config_prod_bynear_two_new.json</arg>
<arg>20170702-10</arg>
</spark>
<ok to='end'/>
<error to='fail'/>
</action>
<kill name='fail'>
<message>spark action fail</message>
</kill>
<end name='end' />
</workflow-app>
# 验证workflow.xml,这个很有用
oozie validate -oozie http://node1:11000/oozie /tmp/spark-oozie/spark-workflow.xml
报错总结: Oozie URL is not available 命令行中加-oozie http://{host}:{port}/oozie
Class org.apache.oozie.action.hadoop.SparkMain oozie-site中增加
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [101] 将提交方式改为cluster。且将权限改对即可。
-
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext.setAMContainerResourceRequests 找到对应的spark-assembly.jar 并放到hdfs中,然后在sparkopts中追加 --conf spark.yarn.jar=hdfs://nameservice1/user/root/oozie/apps/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar
-
Call From *** to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; jobTracker改成和 yarn-site.xml yarn.resourcemanager.address 一样的值