一. 背景
最近在开发一个调用第三方接口, 并将接口中数据保存值mysql里的功能时, 遭遇了java开发中常见的jar冲突场景.
开发背景是这样的:
大数据对外输出的业务数据中, 有一块是关于业务A的状态判断的(如: 如何判断一个订单为有效订单), 并围绕业务A来做后续业务的数据分析工作.
关于"业务A的状态判断"这块业务非常复杂, 设计到三十多个业务流程的分支判断, 早前大数据自己实现了一版(准实时基于spark;离线基于hive), 但由于分支覆盖面考虑不周, 偶尔会报一些issue, 这些issue或多或少都涉及到三十多个业务流程的分支判断, 给issue 排查 & 修复带来了很多挑战.
经过部门间的协调沟通, 这块业务原本隶属于部门A, 大数据侧的中心在于精准的数据分析/挖掘工作, 不宜涉及过多的业务分析判断工作. 故上述关于业务A的状态判断的功能后面由部门A来进行开发, 并提供接口来供其他部门调用.
部门A接口开发完毕后, 大数据侧经调研, 考虑到除了调用部门A的改接口之外, 还需要利用spark API侧提供的各类丰富的算子去读取mysql + redis 中的数据, 做后续的关联/交并集逻辑处理, 并最终落地至mysql中.
本来基于java接口 或 python都可以实现该功能, 但考虑到上述原因, 最终使用了spark来进行开发. 这为后续的开发环境中jar冲突埋下了伏笔.
接口开发/测试比较顺利, 并在开发环境IDE IDEA中顺利通过了测试. 由于开发过程中需要调用其他部门的接口, 故采用了 httpclient 方案.
封装的工具类 HttpClientUtils 如下:
import java.io.IOException
import com.alibaba.fastjson.JSON
import com.alibaba.fastjson.serializer.SerializerFeature
import org.apache.http.client.ClientProtocolException
import org.apache.http.client.methods.{HttpGet, HttpPost}
import org.apache.http.entity.StringEntity
import org.apache.http.impl.client.{DefaultHttpClient, HttpClientBuilder, HttpClients}
import org.apache.http.util.EntityUtils
import org.slf4j.LoggerFactory
import scala.collection.JavaConversions._
import scala.reflect.macros.ParseException
import java.util
object HttpClientUtils {
val logger = LoggerFactory.getLogger( HttpClientUtils.getClass )
def get(url: String): String = {
val httpclient = HttpClientBuilder.create.build;
try {
// 创建httpget.
val httpget = new HttpGet( url )
// 执行get请求.
val response = httpclient.execute( httpget )
try {
// 获取响应实体
val entity = response.getEntity()
EntityUtils.toString( entity, "utf-8" )
} finally {
response.close()
}
} catch {
case ex: ClientProtocolException => {
logger.error( ex.getMessage );
null
}
case ex: ParseException => {
logger.error( ex.getMessage );
null
}
case ex: IOException => {
logger.error( ex.getMessage );
null
}
} finally {
// 关闭连接,释放资源
httpclient.close()
}
}
def post(url: String, headerMap: util.HashMap[String, Any], bodyMap: util.HashMap[String, Any]): String = {
val bodyJsonString = JSON.toJSONString( bodyMap, SerializerFeature.PrettyFormat )
/*println( s"bodyJsonString = ${bodyJsonString}" )*/
//创建httpclient对象
val client = HttpClientBuilder.create.build;
try {
//创建post方式请求对象
val httpPost = new HttpPost( url )
//设置请求header
if (headerMap != null) {
for (entry <- headerMap.entrySet()) {
// httpPost.setHeader(entry.getKey, entry.getValue)
httpPost.setHeader( entry.getKey, entry.getValue.toString )
}
}
httpPost.setEntity( new StringEntity( bodyJsonString, "UTF-8" ) )
//执行请求操作,并拿到结果(同步阻塞)
val response = client.execute( httpPost )
//获取结果实体
val entity = response.getEntity()
var body = ""
if (entity != null) { //按指定编码转换结果实体为String类型
body = EntityUtils.toString( entity, "UTF-8" )
}
//释放链接
response.close()
body
} finally {
client.close()
}
}
}
CDH集群环境:
spark2: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/jars
hadoop: /opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars
封装spark2 任务提交脚本参数(CDH环境):
spark2-submit \
--master yarn \
--deploy-mode cluster \
--class com.david.datagen.MyXXBizStatusUpdater \
--num-executors 2 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
--conf spark.debug.maxToStringFields=1000 \
--conf spark.port.maxRetries=100 \
--conf spark.driver.maxResultSize=1g \
--conf spark.default.parallelism=20 \
--conf spark.sql.shuffle.partitions=20 \
--conf spark.executor.memoryOverhead=1G \
--conf spark.driver.memoryOverhead=1G \
--conf spark.storage.blockManagerTimeoutIntervalMs=100000 \
--conf spark.shuffle.file.buffer=64k \
--conf spark.reducer.maxSizeInFlight=96M \
--conf spark.network.timeout=300s \
--conf spark.rpc.askTimeout=300s \
--conf spark.shuffle.io.serverThreads=8 \
--conf spark.shuffle.service.enabled=true \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dconfig.resource=application.conf" \
/data/program/invoke_3rd_xxfunc_interface-1.0-SNAPSHOT-jar-with-dependencies.jar
在测试集群上执行时, 抛出了如下异常:
Application Overview
User: hdfs
Name: com.david.datagen.MyXXBizStatusUpdater
Application Type: SPARK
Application Tags:
State: FINISHED
FinalStatus: FAILED
Started: Fri Sep 24 12:08:23 +0800 2021
Elapsed: 1mins, 41sec
Tracking URL: History
Diagnostics:
User class threw exception: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:146)
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:966)
at org.apache.http.impl.client.HttpClients.createDefault(HttpClients.java:56)
at com.david.utils.HttpClientUtils$.post(HttpClientUtils.scala:64)
at com.david.service.datagen.MyXXBizStatusService.getHaveclassRawData(MyXXBizStatusService.scala:134)
at com.david.service.datagen.MyXXBizStatusService.getBizStatusByInterface(MyXXBizStatusService.scala:50)
at com.david.datagen.MyXXBizStatusUpdater$$anonfun$do_business$1.apply$mcVI$sp(MyXXBizStatusUpdater.scala:88)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.david.datagen.MyXXBizStatusUpdater$.do_business(MyXXBizStatusUpdater.scala:80)
at com.david.datagen.MyXXBizStatusUpdater$$anonfun$main$1.apply$mcVI$sp(MyXXBizStatusUpdater.scala:54)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.david.datagen.MyXXBizStatusUpdater$.main(MyXXBizStatusUpdater.scala:47)
at com.david.datagen.MyXXBizStatusUpdater.main(MyXXBizStatusUpdater.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
是的, 核心的报错提示, 如下所示:
User class threw exception: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:146)
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:966)
at org.apache.http.impl.client.HttpClients.createDefault(HttpClients.java:56)
at com.david.utils.HttpClientUtils$.post(HttpClientUtils.scala:64)
二. 排查定位
根据提示, 可以得知, 代码在运行到使用httpclient 第三方jar包时, 在获取SSLConnectionSocketFactory类中的 INSTANCE 属性时, 发现查询不到该属性.
但编程 和 在本地IDEA中执行该段代码时, 并未发现这个问题, 且本地打开SSLConnectionSocketFactory的源码,可以看到INSTANCE 属性是存在的,如下:
/**
* @since 4.3
*/
@ThreadSafe @SuppressWarnings("deprecation")
public class SSLConnectionSocketFactory implements LayeredConnectionSocketFactory {
public static final String TLS = "TLS";
public static final String SSL = "SSL";
public static final String SSLV2 = "SSLv2";
@Deprecated
public static final X509HostnameVerifier ALLOW_ALL_HOSTNAME_VERIFIER
= AllowAllHostnameVerifier.INSTANCE;
@Deprecated
public static final X509HostnameVerifier BROWSER_COMPATIBLE_HOSTNAME_VERIFIER
= BrowserCompatHostnameVerifier.INSTANCE;
@Deprecated
public static final X509HostnameVerifier STRICT_HOSTNAME_VERIFIER
= StrictHostnameVerifier.INSTANCE;
private final Log log = LogFactory.getLog(getClass());
根据以往开发经验, 考虑到这可能是jar包重复问题.
项目采用maven构建, 可以通过打印maven依赖目录树, 看看关于httpclient的jar是否有重复的情况.
在项目根目录下,执行如下代码:
mvn dependency:tree >d:/maven-deps-tree-2104.txt
补充:
通过控制台查看:
输入命令:mvn dependency:tree
如果要输出到文件,找到pom文件的位置 进入命令行
输入命令: mvn dependency:tree >d:/tree.txt
只查看关系的jar包
mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId:version:artifactId:version
输入命令:
mvn dependency:tree -Dverbose -Dincludes=org.springframework:spring-tx
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building invoke_3rd_xxfunc_interface1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for commons-codec:commons-codec:jar:1.15-SNAPSHOT is missing, no dependency information available
[INFO]
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ invoke_3rd_xxfunc_interface ---
[INFO] invoke_3rd_xxfunc_interface:invoke_3rd_xxfunc_interface:jar:1.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-sql_2.11:jar:2.3.0:compile
[INFO] | +- com.univocity:univocity-parsers:jar:2.5.9:compile
[INFO] | +- org.apache.spark:spark-sketch_2.11:jar:2.3.0:compile
[INFO] | +- org.apache.spark:spark-catalyst_2.11:jar:2.3.0:compile
[INFO] | | +- org.scala-lang:scala-reflect:jar:2.11.8:compile
[INFO] | | +- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.4:compile
[INFO] | | +- org.codehaus.janino:janino:jar:3.0.8:compile
[INFO] | | +- org.codehaus.janino:commons-compiler:jar:3.0.8:compile
[INFO] | | \- org.antlr:antlr4-runtime:jar:4.7:compile
[INFO] | +- org.apache.spark:spark-tags_2.11:jar:2.3.0:compile
[INFO] | +- org.apache.orc:orc-core:jar:nohive:1.4.1:compile
[INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] | | +- commons-lang:commons-lang:jar:2.6:compile
[INFO] | | \- io.airlift:aircompressor:jar:0.8:compile
[INFO] | +- org.apache.orc:orc-mapreduce:jar:nohive:1.4.1:compile
[INFO] | | +- com.esotericsoftware:kryo-shaded:jar:3.0.3:compile
[INFO] | | | +- com.esotericsoftware:minlog:jar:1.3.0:compile
[INFO] | | | \- org.objenesis:objenesis:jar:2.1:compile
[INFO] | | \- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.6.4:compile
[INFO] | | +- org.apache.hadoop:hadoop-yarn-common:jar:2.6.4:compile
[INFO] | | | +- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] | | | | \- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] | | | +- com.sun.jersey:jersey-core:jar:1.9:compile
[INFO] | | | +- com.sun.jersey:jersey-client:jar:1.9:compile
[INFO] | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
[INFO] | | | +- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
[INFO] | | | +- com.google.inject:guice:jar:3.0:compile
[INFO] | | | | +- javax.inject:javax.inject:jar:1:compile
[INFO] | | | | \- aopalliance:aopalliance:jar:1.0:compile
[INFO] | | | +- com.sun.jersey:jersey-server:jar:1.9:compile
[INFO] | | | | \- asm:asm:jar:3.1:compile
[INFO] | | | +- com.sun.jersey:jersey-json:jar:1.9:compile
[INFO] | | | | +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] | | | | \- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] | | | \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
[INFO] | | \- com.google.inject.extensions:guice-servlet:jar:3.0:compile
[INFO] | +- org.apache.parquet:parquet-column:jar:1.8.2:compile
[INFO] | | +- org.apache.parquet:parquet-common:jar:1.8.2:compile
[INFO] | | \- org.apache.parquet:parquet-encoding:jar:1.8.2:compile
[INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.8.2:compile
[INFO] | | +- org.apache.parquet:parquet-format:jar:2.3.1:compile
[INFO] | | +- org.apache.parquet:parquet-jackson:jar:1.8.2:compile
[INFO] | | \- org.codehaus.jackson:jackson-core-asl:jar:1.9.11:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1:compile
[INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile
[INFO] | | \- com.fasterxml.jackson.core:jackson-core:jar:2.6.7:compile
[INFO] | +- org.apache.arrow:arrow-vector:jar:0.8.0:compile
[INFO] | | +- org.apache.arrow:arrow-format:jar:0.8.0:compile
[INFO] | | +- org.apache.arrow:arrow-memory:jar:0.8.0:compile
[INFO] | | +- com.carrotsearch:hppc:jar:0.7.2:compile
[INFO] | | \- com.vlkan:flatbuffers:jar:1.2.0-3f79e055:compile
[INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile
[INFO] | \- org.spark-project.spark:unused:jar:1.0.0:compile
[INFO] +- mysql:mysql-connector-java:jar:5.1.32:compile
[INFO] +- org.apache.spark:spark-core_2.11:jar:2.3.0:compile
[INFO] | +- org.apache.avro:avro:jar:1.7.7:compile
[INFO] | | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] | | \- org.tukaani:xz:jar:1.0:compile
[INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:compile
[INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:compile
[INFO] | | \- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile
[INFO] | +- com.twitter:chill_2.11:jar:0.8.4:compile
[INFO] | +- com.twitter:chill-java:jar:0.8.4:compile
[INFO] | +- org.apache.hadoop:hadoop-client:jar:2.6.5:compile
[INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.6.5:compile
[INFO] | | | +- xmlenc:xmlenc:jar:0.52:compile
[INFO] | | | +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile
[INFO] | | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] | | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] | | | +- com.google.code.gson:gson:jar:2.2.4:compile
[INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.6.5:compile
[INFO] | | | | \- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:compile
[INFO] | | | | +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:compile
[INFO] | | | | +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:compile
[INFO] | | | | \- org.apache.directory.api:api-util:jar:1.0.0-M20:compile
[INFO] | | | +- org.apache.curator:curator-client:jar:2.6.0:compile
[INFO] | | | \- org.htrace:htrace-core:jar:3.0.4:compile
[INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.6.5:compile
[INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] | | | \- xerces:xercesImpl:jar:2.9.1:compile
[INFO] | | | \- xml-apis:xml-apis:jar:1.3.04:compile
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.6.5:compile
[INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.6.5:compile
[INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.6.5:compile
[INFO] | | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.6.5:compile
[INFO] | | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.6.5:compile
[INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.6.5:compile
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.6.5:compile
[INFO] | | \- org.apache.hadoop:hadoop-annotations:jar:2.6.5:compile
[INFO] | +- org.apache.spark:spark-launcher_2.11:jar:2.3.0:compile
[INFO] | +- org.apache.spark:spark-kvstore_2.11:jar:2.3.0:compile
[INFO] | | \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile
[INFO] | +- org.apache.spark:spark-network-common_2.11:jar:2.3.0:compile
[INFO] | +- org.apache.spark:spark-network-shuffle_2.11:jar:2.3.0:compile
[INFO] | +- org.apache.spark:spark-unsafe_2.11:jar:2.3.0:compile
[INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.4:compile
[INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.4.1:compile
[INFO] | | +- javax.activation:activation:jar:1.1.1:compile
[INFO] | | +- org.bouncycastle:bcprov-jdk15on:jar:1.52:compile
[INFO] | | \- com.jamesmurty.utils:java-xmlbuilder:jar:1.1:compile
[INFO] | | \- net.iharder:base64:jar:2.3.8:compile
[INFO] | +- org.apache.curator:curator-recipes:jar:2.6.0:compile
[INFO] | | +- org.apache.curator:curator-framework:jar:2.6.0:compile
[INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
[INFO] | | \- com.google.guava:guava:jar:16.0.1:compile
[INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
[INFO] | +- org.apache.commons:commons-lang3:jar:3.5:compile
[INFO] | +- org.apache.commons:commons-math3:jar:3.4.1:compile
[INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:compile
[INFO] | +- org.slf4j:slf4j-api:jar:1.7.16:compile
[INFO] | +- org.slf4j:jul-to-slf4j:jar:1.7.16:compile
[INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.16:compile
[INFO] | +- log4j:log4j:jar:1.2.17:compile
[INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile
[INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile
[INFO] | +- org.lz4:lz4-java:jar:1.4.0:compile
[INFO] | +- com.github.luben:zstd-jni:jar:1.3.2-2:compile
[INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:compile
[INFO] | +- commons-net:commons-net:jar:2.2:compile
[INFO] | +- org.scala-lang:scala-library:jar:2.11.8:compile
[INFO] | +- org.json4s:json4s-jackson_2.11:jar:3.2.11:compile
[INFO] | | \- org.json4s:json4s-core_2.11:jar:3.2.11:compile
[INFO] | | +- org.json4s:json4s-ast_2.11:jar:3.2.11:compile
[INFO] | | \- org.scala-lang:scalap:jar:2.11.0:compile
[INFO] | | \- org.scala-lang:scala-compiler:jar:2.11.0:compile
[INFO] | | \- org.scala-lang.modules:scala-xml_2.11:jar:1.0.1:compile
[INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile
[INFO] | | +- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile
[INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile
[INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile
[INFO] | | | \- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile
[INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile
[INFO] | | \- org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile
[INFO] | | \- org.javassist:javassist:jar:3.18.1-GA:compile
[INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile
[INFO] | | +- javax.annotation:javax.annotation-api:jar:1.2:compile
[INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:compile
[INFO] | | \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile
[INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile
[INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:compile
[INFO] | | \- javax.validation:validation-api:jar:1.1.0.Final:compile
[INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:compile
[INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile
[INFO] | +- io.netty:netty-all:jar:4.1.17.Final:compile
[INFO] | +- io.netty:netty:jar:3.9.9.Final:compile
[INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:compile
[INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.5:compile
[INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.5:compile
[INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.5:compile
[INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.5:compile
[INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.6.7.1:compile
[INFO] | | \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.7.9:compile
[INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile
[INFO] | +- oro:oro:jar:2.0.8:compile
[INFO] | +- net.razorvine:pyrolite:jar:4.13:compile
[INFO] | +- net.sf.py4j:py4j:jar:0.10.6:compile
[INFO] | \- org.apache.commons:commons-crypto:jar:1.0.0:compile
[INFO] +- com.typesafe:config:jar:1.0.1:compile
[INFO] +- com.alibaba:fastjson:jar:1.2.36:compile
[INFO] +- redis.clients:jedis:jar:2.9.0:compile
[INFO] | \- org.apache.commons:commons-pool2:jar:2.4.2:compile
[INFO] +- org.apache.spark:spark-hive_2.11:jar:2.3.0:compile
[INFO] | +- com.twitter:parquet-hadoop-bundle:jar:1.6.0:compile
[INFO] | +- org.spark-project.hive:hive-exec:jar:1.2.1.spark2:compile
[INFO] | | +- commons-io:commons-io:jar:2.4:compile
[INFO] | | +- javolution:javolution:jar:5.5.1:compile
[INFO] | | +- log4j:apache-log4j-extras:jar:1.2.17:compile
[INFO] | | +- org.antlr:antlr-runtime:jar:3.4:compile
[INFO] | | | +- org.antlr:stringtemplate:jar:3.2.1:compile
[INFO] | | | \- antlr:antlr:jar:2.7.7:compile
[INFO] | | +- org.antlr:ST4:jar:4.0.4:compile
[INFO] | | +- com.googlecode.javaewah:JavaEWAH:jar:0.3.2:compile
[INFO] | | +- org.iq80.snappy:snappy:jar:0.2:compile
[INFO] | | +- stax:stax-api:jar:1.0.1:compile
[INFO] | | \- net.sf.opencsv:opencsv:jar:2.3:compile
[INFO] | +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark2:compile
[INFO] | | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
[INFO] | | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | | +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] | | +- org.datanucleus:datanucleus-api-jdo:jar:3.2.6:compile
[INFO] | | +- org.datanucleus:datanucleus-rdbms:jar:3.2.9:compile
[INFO] | | +- commons-pool:commons-pool:jar:1.5.4:compile
[INFO] | | +- commons-dbcp:commons-dbcp:jar:1.4:compile
[INFO] | | \- javax.jdo:jdo-api:jar:3.0.1:compile
[INFO] | | \- javax.transaction:jta:jar:1.1:compile
[INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | +- org.apache.calcite:calcite-avatica:jar:1.2.0-incubating:compile
[INFO] | +- org.apache.calcite:calcite-core:jar:1.2.0-incubating:compile
[INFO] | | +- org.apache.calcite:calcite-linq4j:jar:1.2.0-incubating:compile
[INFO] | | \- net.hydromatic:eigenbase-properties:jar:1.1.5:compile
[INFO] | +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile
[INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] | +- commons-codec:commons-codec:jar:1.10:compile
[INFO] | +- joda-time:joda-time:jar:2.9.3:compile
[INFO] | +- org.jodd:jodd-core:jar:3.5.2:compile
[INFO] | +- org.datanucleus:datanucleus-core:jar:3.2.10:compile
[INFO] | +- org.apache.thrift:libthrift:jar:0.9.3:compile
[INFO] | +- org.apache.thrift:libfb303:jar:0.9.3:compile
[INFO] | \- org.apache.derby:derby:jar:10.12.1.1:compile
[INFO] +- net.sourceforge.jexcelapi:jxl:jar:2.6.10:compile
[INFO] +- org.apache.poi:poi:jar:3.17:compile
[INFO] | \- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] \- org.apache.poi:poi-ooxml-schemas:jar:3.17:compile
[INFO] \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.453 s
[INFO] Finished at: 2021-09-26T15:54:37+08:00
[INFO] Final Memory: 27M/434M
[INFO] ------------------------------------------------------------------------
其中httpclient / httpcore 相关的jar包及其父jar之间的对应关系如下:
# ①. 在 spark-core_2.11 jar中
[INFO] +- org.apache.spark:spark-core_2.11:jar:2.3.0:compile
[INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.4:compile
[INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.4.1:compile
# ②. 在 spark-hive_2.11 jar中
[INFO] +- org.apache.spark:spark-hive_2.11:jar:2.3.0:compile
[INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile
这里主要看下与本次冲突直接相关的httpclient相关jar包, 可以看到, spark-hive_2.11 jar包里, 间接依赖了以下2个jar包:
- commons-httpclient-3.1.jar
- httpclient-4.5.4.jar
经查, 在commons-httpclient-3.1.jar 包中, 不存在 上述报错堆栈中的:
org.apache.http.conn.ssl.SSLConnectionSocketFactory
类, 而 httpclient-4.5.4.jar 中存在该类.
且在pom.xml中的 spark-hive_2.11 包下添加 exclusion 移除该间接依赖后, 代码中引用httpclient 的位置编译不通了, 这说明本地代码是直接使用了该jar包.
为了后续更直观的在pom.xml获悉到底用的是哪个 httpclient 的版本, 我直接将如下两个jar包间接引用的httpclient包给禁掉, 采用显式引用 httpclient 包的方式, pom.xml中代码的核心调整位置如下:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.4</version>
</dependency>
<!-- 导入spark的依赖 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</exclusion>
</exclusions>
</dependency>
本地运营一下spark程序, 查看此时 httpclient 相关jar包的引用位置:
可以看到, 此时已经不再使用spark-core 或 spark-hive中的间接依赖jar,而是直接使用maven仓库里的jar了.
调整完后, 再次打包上传到测试CDH集群上, 再次报出了与刚才一模一样的错误:
User class threw exception: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:146)
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:966)
at org.apache.http.impl.client.HttpClients.createDefault(HttpClients.java:56)
at com.david.utils.HttpClientUtils$.post(HttpClientUtils.scala:64)
这下就纳闷了, 本地IDEA中运行是好的, 打包到服务器上就有问题, 到底是什么原因引起的?
经过仔细梳理, 发现本地spark 环境 与测试线上的spark环境有很大的不同:
本地:
val sparkConf = new SparkConf().setAppName( "test" )
.set( "spark.serializer", "org.apache.spark.serializer.KryoSerializer" )
.set( "spark.debug.maxToStringFields", "200" )
SparkDebugTools.tryEnableLocalRun( sparkConf )
// SparkConf对象 参数配置
def tryEnableLocalRun(sparkConf: SparkConf): Unit = {
logger.info(">>>> Current OS is : " + os)
if (os.indexOf("linux") == -1) {
sparkConf.setMaster("local[1]")
}
}
如上所示, 在本地使用spark local模式可以成功运行,但到了测试服务器上, 则使用yarn-cluster 或 yarn-client方式运行,
到这里环境就不一样了.
由于CDH集群中的spark2采用parcels方式安装, 此时运行spark2程序时, 将会首先加载${SPARK_CLASS_PATH}/jars
中的类库到jvm的ClassLoader中, 且在spark2的运行过程中, 还会使用hadoop相关的类库, 同理, 也会把${HADOOP_CLASS_PATH}/jars
中的类库到jvm的ClassLoader中, 之后会把系统class路径下的类
加载到jvm的ClassLoader中, 最后才是加载用户开发的业务jar包
到jvm的ClassLoader中.
这其中, 由于SPARK/Hadoop类库中的httpclient相关jar包会优先加载到jvm的ClassLoader中, 故本地引用的httpclient在服务器上是最后加载到ClassLoader中的.
如果线上SPARK/Hadoop类库中的httpclient jar包的版本恰好比较低, 那么就会产生jar包重复的问题了, 这个问题的根源在于 ClassLoader 的加载顺序.
三. 解决方案
3.1 解决方案 1(验证未通过
):
在spark2-submit 设置如下参数
–conf spark.driver.userClassPathFirst=true
–conf spark.executor.userClassPathFirst=true
设置这个参数后: JVM中的类ClassLoader中的加载次序为:
用户自定义SPARK CLASS PATH -> SPARK CLASS PATH -> System CLASS PATH
经过这样的调整后, spark2-submit 脚本变为如下形式:
spark2-submit \
--master yarn \
--deploy-mode cluster \
--class com.david.datagen.MyXXBizStatusUpdater \
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true \
--num-executors 2 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
--conf spark.debug.maxToStringFields=1000 \
--conf spark.port.maxRetries=100 \
--conf spark.driver.maxResultSize=1g \
--conf spark.default.parallelism=20 \
--conf spark.sql.shuffle.partitions=20 \
--conf spark.executor.memoryOverhead=1G \
--conf spark.driver.memoryOverhead=1G \
--conf spark.storage.blockManagerTimeoutIntervalMs=100000 \
--conf spark.shuffle.file.buffer=64k \
--conf spark.reducer.maxSizeInFlight=96M \
--conf spark.network.timeout=300s \
--conf spark.rpc.askTimeout=300s \
--conf spark.shuffle.io.serverThreads=8 \
--conf spark.shuffle.service.enabled=true \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dconfig.resource=application.conf" \
/data/program/invoke_3rd_xxfunc_interface-1.0-SNAPSHOT-jar-with-dependencies.jar
执行该脚本后, 程序报出了如下错误:
User: hdfs
Name: com.david.datagen.MyXXBizStatusUpdater
Application Type: SPARK
Application Tags:
State: FINISHED
FinalStatus: FAILED
Started: Sat Sep 25 21:45:27 +0800 2021
Elapsed: 21sec
Tracking URL: History
Diagnostics:
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, dn5.testhdp.com, executor 1): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1609)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1597)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1596)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1596)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1830)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1779)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1768)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
at com.david.datagen.MyXXBizStatusService.getBizStatusByInterface(MyXXBizStatusService.scala:45)
经过梳理, 发现问题的症结出现在这2句代码上:
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true \
加入该设置后, ClassLoader将优先加载用户自定义jar包, 如果不满足才会下沉使用框架中的类库, 如果用户使用的类库与集群中有一定的差异, 如服务器上的集群为CDH版本, 而本地pom.xml中使用的是apache社区版的类库, 则会导致spark/hadoop及支撑其功能的第三方类库产生较大的不同. 本例中, 使用上述配置后, httpclient 将优先加载用户自定义的 httpclient-4.5.4.jar, 但是spark/hadoop类库也优先加载用户自定义的, 则有可能导致:
java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task
这个异常.
关于这个异常, Spark官网 Issues栏目中有个经典的问题讨论帖, 里面深入的讨论了该问题产生的根源及相应的解决建议,感兴趣的读者可以仔细阅读一下, 传送门地址:
如果坚持使用这种方案来解决httpclient的jar重复, 则会陷入解决spark/hadoop类库的冲突中来. 当然可以通过使用如下方案类解决:
①. 使用–driver-class-path ****.jar, 但是这个参数只会在指定的JAR 包去加载类, 如果JAR包中没有那个类就会报错退出.
②. 将自定义jar包中有关spark/hadoop及相关jar包的版本挨个做适配, 一直整理出一个能适配服务器CDH集群的 pom.xml dependencies 集合出来, 那么这个工作所耗费的时间估计是天文数字.
根据综上流程来看, 似乎陷入了一个进退两难的境地, 但似乎前者(适配httpclient)的工程量不是那么大.
3.2 解决方案 2(验证通过
):
于是乎, 重新排查这个问题的根源:
User class threw exception: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:146)
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:966)
at org.apache.http.impl.client.HttpClients.createDefault(HttpClients.java:56)
at com.david.utils.HttpClientUtils$.post(HttpClientUtils.scala:64)
既然我们本地IDEA可以正常运行, 且在pom.xml中已经针对冲突做了改进, 但发到线上就运行报错的原因就只能归结到线上存在与 httpclient 相关的jar, 且版本存在冲突, 猜测大概率是因为 spark/hadoop 集群中类库的版本更低.
带着这样的疑问, 默默打开了线上spark 和 hadoop的lib库,位置分别如下:
spark2: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/jars
hadoop: /opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars
排查spark2中与httpclient相关的类库相关jar:
[root@dn6 jars]# pwd
/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/jars
[root@cdh1 jars]# ll *http*.jar
-rw-r--r-- 1 root root 305001 Jul 5 2018 commons-httpclient-3.1.jar
-rw-r--r-- 1 root root 324565 Jul 5 2018 httpcore-4.4.8.jar
从lib库中, 移除到上层文件夹中:
[root@cdh1 jars]# mv httpclient*.jar ../
[root@cdh1 jars]# mv httpcore*.jar ../
排查hadoop中与httpclient相关的类库相关jar:
[root@cdh1 jars]# pwd
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/jars
[root@cdh1 jars]# ll http*.jar
-rwxr-xr-x 1 root root 433368 Mar 3 2021 httpclient-4.2.5.jar
-rw-r--r-- 1 root root 585465 Mar 3 2021 httpclient-4.3.jar
-rwxr-xr-x 1 root root 227708 Mar 3 2021 httpcore-4.2.5.jar
-rw-r--r-- 1 root root 282160 Mar 3 2021 httpcore-4.3.jar
可以看到, hadoop类库中,引用了较低版本的 http-client jar (httpclient-4.2.5.jar), 该类中并没有本文开头位置报错的 类 org.apache.http.conn.ssl.SSLConnectionSocketFactory, 进而无法关联到其成员变量: INSTANCE.
从lib库中, 移除到上层文件夹中:
[root@cdh1 jars]# mv http*.jar ../
spark 和 hadoop 框架中类库下移除httpclient 相关的jar之后, 在重跑该spark代码:
spark2-submit \
--master yarn \
--deploy-mode client \
--class com.david.datagen.MyXXBizStatusUpdater \
--num-executors 2 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
--conf spark.debug.maxToStringFields=1000 \
--conf spark.driver.userClassPathFirst=false \
--conf spark.executor.userClassPathFirst=false \
--conf spark.port.maxRetries=100 \
--conf spark.driver.maxResultSize=1g \
--conf spark.default.parallelism=20 \
--conf spark.sql.shuffle.partitions=20 \
--conf spark.executor.memoryOverhead=1G \
--conf spark.driver.memoryOverhead=1G \
--conf spark.storage.blockManagerTimeoutIntervalMs=100000 \
--conf spark.shuffle.file.buffer=64k \
--conf spark.reducer.maxSizeInFlight=96M \
--conf spark.network.timeout=300s \
--conf spark.rpc.askTimeout=300s \
--conf spark.shuffle.io.serverThreads=8 \
--conf spark.shuffle.service.enabled=true \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dconfig.resource=application.conf" \
/data/program/invoke_3rd_xxfunc_interface-1.0-SNAPSHOT-jar-with-dependencies.jar
发现已经可以正常运行.
当然, 在用户自定义的程序中, 可以使用与hadoop类库中类似的低版本的httpclient, 根据默认的ClassLoader加载顺序, 应该不会出现本次冲突. 当然,这是后话, 我并未验证, 感兴趣的小伙伴, 可以进一步做下验证.
本例中, 为了方便看出用户自定义jar中某个类是引用了哪个jar包, 可以使用如下方式来简单获取:
import org.apache.commons.lang3.StringUtils;
import org.apache.http.conn.ssl.SSLConnectionSocketFactory;
import java.net.URL;
/**
* ClassLoader路径查询器
*
* --找出当前Class对象所属jar包及包路径package信息
*
*/
public class ClassLoaderPathFetcher {
public static void getClassLoaderPath(Class<?> clazz) {
String className = clazz.getName();
String classNamePath = className.replace(".", "/") + ".class";
System.out.println("classNamePath = " + classNamePath);
URL url = clazz.getClassLoader().getResource(classNamePath);
String path = url.getFile();
System.out.println("path = " + path);
path = StringUtils.replace(path, "%20", " ");
System.out.println(StringUtils.removeStart(path, "/"));
}
public static void main(String[] args) {
getClassLoaderPath(SSLConnectionSocketFactory.class);
}
}
/**
打印结果如下:
classNamePath = org/apache/http/conn/ssl/SSLConnectionSocketFactory.class
path = file:/D:/MAVEN_REPO/org/apache/httpcomponents/httpclient/4.5.4/httpclient-4.5.4.jar!/org/apache/http/conn/ssl/SSLConnectionSocketFactory.class
file:/D:/MAVEN_REPO/org/apache/httpcomponents/httpclient/4.5.4/httpclient-4.5.4.jar!/org/apache/http/conn/ssl/SSLConnectionSocketFactory.class
*/
在scala类中, 可以这样引用这个java工具类:
val sparkSession = sqlContext.sparkSession
logger.debug( "---------------------------1 Before invoke ClassLoaderPathFetcherUtils.getClassLoaderPath()-------------------------------" )
ClassLoaderPathFetcherUtils.getClassLoaderPath( classOf[SSLConnectionSocketFactory] )
logger.debug( "---------------------------2 After invoke ClassLoaderPathFetcherUtils.getClassLoaderPath()-------------------------------" )
/**
注意:
java里的 Clazz.class, 对应scala里的 classOf[Clazz]
*/
.
.
.
2021-09-30补充说明:
如果将${HADOOP_HOME}中的如下httpclient相关jar包移走,
-rwxr-xr-x 1 root root 433368 Mar 3 2021 httpclient-4.2.5.jar
-rw-r--r-- 1 root root 585465 Mar 3 2021 httpclient-4.3.jar
-rwxr-xr-x 1 root root 227708 Mar 3 2021 httpcore-4.2.5.jar
-rw-r--r-- 1 root root 282160 Mar 3 2021 httpcore-4.3.jar
当集群(这里为CDH)不重启暂时没问题, 一旦重启, 则会报如下无法访问webhdfs的错误:
原因为无法在hadoop类路径下找到org.apache.http.client.utils.URLEncodedUtils类, 如下:
经查, 高版本的httpclient-4.5.4.jar下, 文件package 与 类名均未更改,如下:
解决方案:
将spark中使用的高版本的httpclient相关jar包重新引入到 ${HADOOP_HOME}/lib下,并分发到每一台hadoop集群节点上.
引用列表(致谢):
- Spark常见错误问题汇总
- java中Class对象详解和类名.class, class.forName(), getClass()区别
- hadoop jar包_Spark 如何摆脱java双亲委托机制优先从用户jar加载类?
- Spark Job Fails with ResultTask ClassCastException
- Spark常见错误问题汇总
- spark-submit参数详解
- Spark依赖包冲突解决
- HttpClient 4.5.x 之后 Deprecated 废弃API 的替代对应策略
- spark-jar冲突解决方案
- Java的jdbc使用addBatch进行批处理操作的几种方式
- Mysql INSERT、REPLACE、UPDATE的区别