1、SparkLauncher简介
SparkLauncher支持两种模式:
(1).new SparkLauncher().launch(),直接启动一个Process,效果跟Spark submit提交一样
(2).new SparkLauncher().startApplicaiton(监听器) 返回一个SparkAppHandler,并(可选)传入一个监听器
优点:
自带输出重定向(Output,Error都有,支持写到文件里面。可以自定义监听器,当信息或者状态变更时,都能进行操作。返回的SparkAppHandler支持 暂停、停止、断连、获得AppId、获得State等多种功能。
2、代码实例
(0)pom文件
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-launcher -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-launcher_2.12</artifactId>
<version>2.4.4</version>
</dependency>
(1)首先编写Spark程序:
import java.util
import java.util.Properties
import io.pivotal.greenplum.spark.jdbc.ConnectionManager
import org.apache.http.HttpRequest
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.{SaveMode, SparkSession}
import scala.collection.mutable.ArrayBuffer
/**
* create by yansu on 2019/12/19 15:58
*/
object SparkRead4gp {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[2]").setAppName(this.getClass.getName)
val ssc = SparkSession.builder().config(conf).getOrCreate()
val pro = new Properties()
pro.put("driver", "org.postgresql.Driver")
pro.put("user", "gpadmin")
pro.put("password", "gpadmin")
var df = Read4GP(ssc)
df.show()
df.withColumn("time", df.col("time").cast(StringType)).show()
val list = Array("time", "b", "c", "d", "e", "f", "g")
for (elem <- list) {
if (elem.equalsIgnoreCase("time")) {
df = df.withColumn(elem, substring(df.col(elem).cast(StringType), 12, 19).as(elem))
}
}
df.write.mode(SaveMode.Append).jdbc("jdbc:postgresql://localhost:5432/postgres", "di_test.test_1223", pro)
}
//解码
def decodeBase64Json(userConfig: String): String = {
val res = new String(java.util.Base64.getDecoder.decode(userConfig))
res
}
def Read4GP(spark1: SparkSession) = {
val gscReadOptionMap = Map.apply(
"url" -> "jdbc:postgresql://localhost:5432/postgres",
// "url" -> "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=postgres",
"user" -> "gpadmin",
"driver" -> "org.postgresql.Driver",
"password" -> "gpadmin",
"dbtable" -> "gptest_1220",
"partitionColumn" -> "id"
)
var gpdf = spark1.read.format("greenplum")
.options(gscReadOptionMap)
.load()
gpdf
}
}
(2)编写Java代码
import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;
import java.io.IOException;
/**
* create by yansu on 2019/12/30 15:26
*/
public class Launcher {
public static void main(String[] args) throws IOException {
SparkAppHandle handler = new SparkLauncher()
.setAppName("Java spark submit")
.setSparkHome("/SPARK_HOME")
.setConf("spark.driver.memory", "2g")
.setConf("spark.executor.memory", "1g")
.setConf("spark.executor.cores", "3")
.setAppResource("/home/***.jar")
.setMainClass("SparkRead4gp")
.setDeployMode("client")
.startApplication(new SparkAppHandle.Listener() {
public void stateChanged(SparkAppHandle handle) {
System.out.println("********** state changed **********");
}
public void infoChanged(SparkAppHandle handle) {
System.out.println("********** info changed **********");
}
});
while (!"FINISHED".equalsIgnoreCase(handler.getState().toString()) && !"FAILED".equalsIgnoreCase(handler.getState().toString())) {
System.out.println("id " + handler.getAppId());
System.out.println("state " + handler.getState());
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
(3)打包运行
上传jar包到部署Spark的服务器上,由于SparkLauncher所在的类引用了SparkLauncher,所以还需要把这个jar也上传到服务器上。
(4)使用java执行命名启动程序:
java -Djava.ext.dirs=/home/xinghailong/launcher -cp launcher_test.jar Launcher
到此程序就可以成功启动了。
此方法优点:
(1)使用Java代码来提交spark更加的灵活,能够适应更多的场景;
(2)可以兼容Java的框架,使大数据和其他平台更好的兼容。