大部分问题是jar包版本冲突, 导致各种类找不到
根本问题是kylin4.0测试版, 依赖spark2.4.6和cdh6.3.2
spark2.4.6依赖hadoop2.7和hive1.x
cdh6.3.2自带hadoop3.0和hive2.x
或者可以试试EMR5.31, 自带spark2.4.6, 可能冲突小一些
NoSuchFieldError: INSTANCE
- java.lang.NoSuchFieldError: INSTANCE
- java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory
http包冲突, INSTANCE字段在低版本包不存在
cdh6.3.2和spark2.4.6使用的都是httpcore-4.4.x以上版本, kylin4.0启动web页面的tomcat中的kylin.war包里打着一个httpcore-4.2.2.jar
解决办法: 解压war包替换httpcore包版本, 重新打war包放回tomcat里
Spark相关ClassNotFound
- java.lang.ClassNotFoundException: parquet.DefaultSource
- java.lang.ClassNotFoundException: Failed to find data source: parquet.
- yarn找不到
kylin目录下的spark相关jar包没被加载进classpath
解决办法: 修改kylin.sh启动脚本, 手动把spark依赖加进去
7337端口拒接访问
kylin默认启动了spark的shuffle service
解决办法: 要么去查看yarn的7337端口是不是没起来, 要么直接修改kylin配置, 关闭shuffle service
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
hive版本问题, 该字段在hive1.x中存在, hive2.x被删除
spark2.4.6依赖的hive1.x, cdh6.3.2依赖的hive2.x
网上有各种办法, 试了对我都不起作用, 下面是我的解决办法
解决办法: 下载spark2.4.6源码, 删除源码中HIVE_STATS_JDBC_TIMEOUT
和HIVE_STATS_RETRIES_WAIT
字段, 然后重新编译spark, 使用编译好的spark-hive包替换spark/jars中的spark-hive包
理论依据: https://issues.apache.org/jira/browse/SPARK-18112
编译命令
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.5 -Phive -Phive-thriftserver -DskipTests clean package
Class org.apache.hive.hcatalog.data.JsonSerDe not found
创建json格式的外部表报的相关错误
- Class org.apache.hive.hcatalog.data.JsonSerDe not found
- java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.Deserializer
- java.lang.ClassCastException: org.apache.hive.hcatalog.data.JsonSerDe cannot be cast to org.apache.hadoop.hive.serde2.Deserializer
- cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
解决办法: 一开始hive-hcatalog-core包没找到, 加到cdh的hive目录下, 后续还是有问题, 更换了json表的序列化类, 换成了org.apache.hadoop.hive.serde2.JsonSerDe
未解决
- Error occurred when check resource. Ignore it and try to submit this job.
- java.lang.UnsupportedOperationException: empty.max
我忘了
- java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org.apache.hadoop.hive.ql.metadata.Table)
- java.lang.NoSuchMethodError: org.apache.parquet.bytes.BytesInput.toInputStream()Lorg/apache/parquet/bytes/ByteBufferInputStream;
- java.lang.ClassCastException: org.apache.hadoop.hive.ql.metadata.Partition cannot be cast to org.apache.hadoop.hive.ql.metadata.PartitiontectUtils$$anonfun$getPaths$1.apply(ResourceDetectUtils.scala:43)