但是我更喜欢使用该篇文章中介绍的使用java来调用spark-submit.sh shell提交任务,并从spark-sbumit.sh执行界面获取applicationId的方案。使用hadoop api方式需要配置好环境,以及根据hadoop版本不同,需要引入不通包。
用java调用shell使用说明:
用java调用shell,使用
1 Process p=Runtime.getRuntime().exec(String[] cmd);
Runtime.exec()方法将产生一个本地的进程,并返回一个Process子类的实例,该实例可用于控制进程或取得进程的相关信息。
由于调用Runtime.exec方法所创建的子进程没有自己的终端或控制台,因此该子进程的标准IO(如stdin,stdou,stderr)都通过
1 p.getOutputStream(),2 p.getInputStream(),3 p.getErrorStream()
方法重定向给它的父进程了.用户需要用这些stream来向 子进程输入数据或获取子进程的输出。
例如:Runtime.getRuntime().exec("ls")
另外需要关心的是Runtime.getRuntime().exec()中产生停滞(阻塞,blocking)的问题?
因为Runtime.getRuntime().exec()要自己去处理stdout和stderr的输出,就是说,执行的结果不知道是现有错误输出(stderr),还是现有标准输出(stdout)。你无法判断到底那个先输出,所以可能无法读取输出,而一直阻塞。
例如:你先处理标准输出(stdout),但是处理的结果是先有错误输出(stderr),一直在等错误输出(stderr)被取走了,才到标准输出(stdout),这样就产生了阻塞。
解决办法:
用两个线程将标准输出(stdout)和错误输出(stderr)。
参考代码:
1 import java.util.*;2 import java.io.*;3
4 class StreamGobbler extendsThread5 {6 InputStream is;7 String type;8
9 StreamGobbler(InputStream is, String type)10 {11 this.is =is;12 this.type =type;13 }14
15 public voidrun()16 {17 try
18 {19 InputStreamReader isr = newInputStreamReader(is);20 BufferedReader br = newBufferedReader(isr);21 String line=null;22 while ( (line = br.readLine()) != null)23 System.out.println(type + ">" +line);24 } catch(IOException ioe)25 {26 ioe.printStackTrace();27 }28 }29 }30
31 public classExecRunner32 {33 public static voidmain(String args[])34 {35 if (args.length < 1)36 {37 System.out.println("USAGE: java GoodWindowsExec ");38 System.exit(1);39 }40
41 try
42 {43 String osName = System.getProperty("os.name");44 String[] cmd = new String[3];45 if( osName.equals( "Windows NT") )46 {47 cmd[0] = "cmd.exe";48 cmd[1] = "/C";49 cmd[2] = args[0];50 }51 else if( osName.equals( "Windows 95") )52 {53 cmd[0] = "command.com";54 cmd[1] = "/C";55 cmd[2] = args[0];56 } else{57 StringTokenizer st = new StringTokenizer(command, " ");58 cmd = newString[st.countTokens()];59 int token = 0;60 while(st.hasMoreTokens()) {61 String tokenString =st.nextToken();62 //System.out.println(tokenString);
63 cmd[token++] =tokenString;64 }65 }66
67 Runtime rt =Runtime.getRuntime();68 System.out.println("Execing " + cmd[0] + " " + cmd[1]69 + " " + cmd[2]);70 Process proc =rt.exec(cmd);71 //any error message?
72 StreamGobbler errorGobbler = new
73 StreamGobbler(proc.getErrorStream(), "ERROR");74
75 //any output?
76 StreamGobbler outputGobbler = new
77 StreamGobbler(proc.getInputStream(), "OUTPUT");78
79 //kick them off
80 errorGobbler.start();81 outputGobbler.start();82
83 //any error???
84 int exitVal =proc.waitFor();85 System.out.println("ExitValue: " +exitVal);86 } catch(Throwable t)87 {88 t.printStackTrace();89 }90 }91 }
View Code
使用JAVA调用spark-submit.sh实现
spark-submit提交脚本submit_test.sh
#/bin/shjarspath=''
for file in `ls /home/dx/djj/sparkjars/*.jar`
do
jarspath=${file},$jarspath
done
jarspath=${jarspath%?}
echo $jarspath
/home1/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--class com.dx.test.BroadcastTest \
--properties-file ./conf/spark-properties-mrs.conf \
--jars $jarspath \
--num-executors 10 \
--executor-memory 3G \
--executor-cores 1 \
--driver-memory 2G \
--driver-java-options "-XX:+TraceClassPaths" \
./test.jar $1 $2 $3 $4
注意:yarn的提交方式测试时,需要修改--deploy-mode参数:
cluster方式:--deploy-mode cluster \
client 方式:--deploy-mode client \
我们如果需要从spark-submit中获取到applicationId,就需要从spark-submit执行打印结果(也就是Process对象的标准输出、错误输出)过滤出applicationId,如果用过spark-submit.sh提交spark任务的话,你会发现执行时,在打印界面上会输出applicationId。
yarn的client方式(--deploy-mode client)时,执行spark-submit.sh提交任务打印applicationid的位置:
19/04/02 11:38:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@215a34b4{/static,null,AVAILABLE,@Spark}19/04/02 11:38:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2e380628{/,null,AVAILABLE,@Spark}19/04/02 11:38:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1eaf1e62{/api,null,AVAILABLE,@Spark}19/04/02 11:38:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@652ab8d9{/jobs/job/kill,null,AVAILABLE,@Spark}19/04/02 11:38:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51e0301d{/stages/stage/kill,null,AVAILABLE,@Spark}19/04/02 11:38:31 INFO client.RMProxy: Connecting to ResourceManager at vm10.60.0.11.com.cn/10.60.0.11:8032[Opened/usr/java/jdk1.8.0_152/jre/lib/jce.jar]
[Opened/usr/java/jdk1.8.0_152/jre/lib/charsets.jar]19/04/02 11:40:24 INFO impl.YarnClientImpl: Submitted application application_1548381669007_0829
yarn的cluster方式(--deploy-mode cluster)时,执行spark-submit.sh提交任务打印applicationid的位置:
19/04/02 11:40:22 INFO yarn.Client: Application report forapplication_1548381669007_0828 (state: ACCEPTED)19/04/02 11:40:23 INFO yarn.Client: Application report forapplication_1548381669007_0828 (state: ACCEPTED)19/04/02 11:40:24 INFO yarn.Client: Application report forapplication_1548381669007_0828 (state: ACCEPTED)19/04/02 11:40:25 INFO yarn.Client: Application report forapplication_1548381669007_0828 (state: ACCEPTED)19/04/02 11:40:26 INFO yarn.Client: Application re