Spark 程序依赖与spark 库jar包冲突解决方案

问题描述:使用spark的Structured Streaming写数据到es,加载的spark集群jars下的jar包版本(httpclient-4.5.4.jar)跟依赖的jar包版本不一致(httpclient-4.5.10.jar),导致任务失败。

我在idea上本地调试,使用httpclient-4.5.10.jar的话,是能正常访问es的。

在yarn上的错误日志:

Caused by: java.lang.BootstrapMethodError: call site initialization exception
at java.lang.invoke.CallSite.makeSite(CallSite.java:341)
at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
at java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:312)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:296)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1632)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
at org.elasticsearch.client.RestHighLevelClient.bulk(RestHighLevelClient.java:537)
at com.gac.gx.sink.EnEventToESSink.close(EnEventToESSink.scala:99)
at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:60)
at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.invoke.LambdaConversionException: Invalid receiver type interface org.apache.http.Header; not a subtype of implementation type interface org.apache.http.NameValuePair
at java.lang.invoke.AbstractValidatingLambdaMetafactory.validateMetafactoryArgs(AbstractValidatingLambdaMetafactory.java:233)
at java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:303)
at java.lang.invoke.CallSite.makeSite(CallSite.java:302)
... 22 more

该问题经过在网上查询之后,确定是httpclient版本低导致问题

spark任务class的默认加载顺序如下:

        1. SystemClasspath -- Spark安装时候提供的依赖包

        2. UserClassPath   -- Spark-submit --jars 提交的依赖包 或用户的app.jar

然后就想让spark先加载自己的jar包

尝试使用spark.driver.extraClassPath、spark.executor.extraClassPath,无效,程序依然报上边的错误

--jars lib/httpcore-4.4.12.jar,lib/httpclient-4.5.10.jar \
--conf spark.driver.extraClassPath=/home/hadoop/data/sy/a29-driver-behavior-bigdata-streaming/lib/httpcore-4.4.12.jar,/home/hadoop/data/sy/a29-driver-behavior-bigdata-streaming/lib/httpclient-4.5.10.jar \
  --conf spark.executor.extraClassPath=/home/hadoop/data/sy/a29-driver-behavior-bigdata-streaming/lib/httpcore-4.4.12.jar,/home/hadoop/data/sy/a29-driver-behavior-bigdata-streaming/lib/httpclient-4.5.10.jar \

又尝试配置--conf “spark.driver.userClassPathFirst=true”和--conf “spark.executor.userClassPathFirst=true”,程序报如下错误

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<init>(YarnSparkHadoopUtil.scala:54)
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<clinit>(YarnSparkHadoopUtil.scala)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply$mcJ$sp(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:80)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl cannot be cast to org.apache.hadoop.yarn.api.records.Priority
    at org.apache.hadoop.yarn.api.records.Priority.newInstance(Priority.java:39)
    at org.apache.hadoop.yarn.api.records.Priority.<clinit>(Priority.java:34)
    ... 13 more

最终解决方案:

使用maven shade插件的relocation解决问题,maven-shade-plugin中提供了一个Relocating(迁移)的功能,通过将原来包下的类迁移到我们指定的包名下。

          <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <relocations>
                                <relocation>
                                    <pattern>org.apache.http</pattern>
                                    <shadedPattern>shard.org.apache.http</shadedPattern>
                                </relocation>
                            </relocations>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.gac.gx.GxAlarmApplication</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

sunyang098

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值