springboot项目在spark yarn 集群上部署运行

11 篇文章 0 订阅
3 篇文章 0 订阅

需求:

   项目采用springboot开发,用于实现历史数据的统计分析及算法计算,通过spark-yarn运行跑出结果。

环境:

JDK:1.8
操作系统:centos 7.6
大数据架构:Hadoop -Yarn-HA;spark 、hive 集群模型

项目预期效果:

./spark-submit --master yarn-client --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.4.5.jar 100

预期效果:在集群环境通过上面脚本方式调用springboot打包出来的jar直接调用入口主类正常地跑出数据。

问题:

使用springboot正常打包方式打出的fat依赖jar包后,调用脚本spark-submit在集群运行时总是报错:

    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)

TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.0.107, executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaRDD$$anonfun$filter$1
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2290)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2208)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2066)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1570)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2208)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2066)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1570)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2208)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2066)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1570)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2208)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2066)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1570)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:430)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
java 序列化问题,在spark -yarn-client模式下,driver即在提交代码的客户端,但executor可能会在不同的NM节点上,在spark算子map-shuffle时就会存在数据对象传输,涉及到对象序列化反序列化(Serde)问题。

本想是spark配置序列化问题,在spark.conf加入 

spark.serializer org.apache.spark.serializer.JavaSerializer

 

不起效果。放弃。

springboot 正常打包方式我们都采用其自带的spring-boot-maven-plugin模式如下:

<plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <mainClass>package.mainclass</mainClass>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>repackage</goal>
                        </goals>
                    </execution>
                </executions>

            </plugin>

会将spark相关依赖包都会打到一个jar fat包中。将依赖包版本与集群使用的版本保持一致,还存在同样问题。

思考:我之前的打包方式包含了太多的jar包,不能确保以下两点:

1,Make sure Jars loaded on the driver are in the executor's classpath

1,Make sure Jars provided by Spark aren't included in your application

搜集网上资料,多次试验后,采用maven-shade-plugin打包方式打出jar,测试竟然通过。

<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <keepDependenciesWithProvidedScope>false</keepDependenciesWithProvidedScope>
                            <createDependencyReducedPom>false</createDependencyReducedPom>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.handlers</resource>
                                </transformer>
                                <transformer
                                        implementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer">
                                    <resource>META-INF/spring.factories</resource>
                                </transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.schemas</resource>
                                </transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>package.mainclass</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

具体参考:https://stackoverflow.com/questions/45189701/submitting-spring-boot-application-jar-to-spark-submit

We ran into the same problem, actually, on the same day you posted this. Our solutions was to use the shade plugin for maven to edit our build a bit. We found that when packaging with the spring-boot-maven plugin it nested our classes in BOOT-INF/classes which spark didn't like. I'll paste the relevant section so you can try it out on your own application -- good luck!

I have found that simply skipping the class name from spark-submit works, i.e. --class com.dip.sparkapp.SparkappApplication


This works for me

<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
    <manifestEntries>
        <Main-Class>packagename.classname</Main-Class>
    </manifestEntries>
</transformer>

现使用spark-submit脚本:

${Spark_Home}/bin/spark-submit \
--name test-spark-demo \
--master yarn \
--deploy-mode client \
--class package.mainclass \
/xx/xxx/xxx.jar

注:之前spark-submit参数 --class 为springboot 启动入口类org.springframework.boot.loader.JarLauncher

其中一点需要说明:

springboot run方法执行时会调用CommandLineRunner 的run方法。

@SpringBootApplication(exclude={DataSourceAutoConfiguration.class})
public class JobMainApplication implements CommandLineRunner{

	public static void main(String[] args) {
		SpringApplication.run(JobMainApplication.class, args);
	}

	@Override
	public void run(String... args) throws Exception {
    }
}
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值