ClassNotFoundException: org.apache.hive.spark.client.Job

hive使用的是3.13版本,spark是3.3.3支持hadoop3.x

hive将engine从mr改成spark,通过beeline执行insert、delete时一直报错,sparkTask rpc关闭,

查看yarn是出现ClassNotFoundException: org.apache.hive.spark.client.Job。

开始以为是版本不兼容问题。spark3.3.3编译版本对应的hive是2.3.9

切换hive2.3.9依然是org.apache.hive.spark.client.Job找不到问题。

spark的jars目录存在hive-exec-core-2.3.9.jar,反复对比hive的lib目录,发现是hive-exec.jar没有core,将spark3.3.3下hive的hive-exec-core-2.3.9.jar删掉,替换成hive下的lib文件,重启spark和hive,这个错误消失了

重新执行命令,又出现找不到其他类的情况,可能是spark官方发布的hive相关包和hive发布版的有差别把。什么原因,不知道,先替换试试。替换了还不太行,引发其他问题,自己编译吧

End of LogType:launch_container.sh.This log file belongs to a running container (container_1701103942975_0002_01_000001) and so may not be complete.
 

环境:hive查询数据,yarn查看日志出现上面的错误

yarn application -list

yarn logs -applicationId application_1701103942975_0002

到hadoop的目录查询运行日志


 

修改部分hive代码,返回编译

修改编译的hive3.1.3源码

turboic/hive-3.1.3 (github.com)

mvn clean package -DskipTests -Pdist

mvn install:install-file -Dfile=mysql-connector-j-8.0.33.jar -DgroupId=com.mysql -DartifactId=mysql-connector-j -Dversion=8.0.33 -Dpackaging=jar

修改编译的Hadoop3.3.6源码

turboic/hadoop-3.3.6 (github.com)

mvn clean install -DskipTests -Pdist

修改编译的Spark3.4.1源码

turboic/spark-3.4.1 (github.com)

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes

并且guava-27.0-jre.jar要改一下。

mvn clean install -DskipTests -Pdist

======================================================================================================================================================  
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/hadoop3/data/*
rm -rf /tmp/*
/cluster/hadoop3/bin/hdfs namenode -format
/cluster/hadoop3/sbin/start-all.sh
tar -zxvf /cluster/spark-3.4.1-bin-custom-spark.tgz
rm -rf /cluster/spark
mv /cluster/spark-3.4.1-bin-custom-spark /cluster/spark
rm -rf /cluster/spark/conf/*
cp -r /cluster/bak-spark/conf/* /cluster/spark/conf
/cluster/spark/sbin/start-all.sh
tar -zxvf /cluster/apache-hive-3.1.3-bin.tar.gz
rm -rf /cluster/hive
mv /cluster/apache-hive-3.1.3-bin /cluster/hive
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hive/conf/*
rm -rf /cluster/hive/bin/*
rm -rf /cluster/hive/logs/*
cp -r /cluster/bak-hive/conf/* /cluster/hive/conf
cp -r /cluster/bak-hive/bin/* /cluster/hive/bin
rm -rf /cluster/hive/lib/hbase*.jar
chown -R root:root /cluster/hive/
chmod 777 /cluster/hive
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &
======================================================================================================================================================  
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/hadoop3/data/*
rm -rf /tmp/*
/cluster/hadoop3/bin/hdfs namenode -format
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/spark/logs/*
rm -rf /cluster/hive/logs/*
/cluster/hadoop3/sbin/start-all.sh
/cluster/spark/sbin/start-all.sh
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &
======================================================================================================================================================  
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
#rm -rf /cluster/hadoop3/data/*
rm -rf /tmp/*
#/cluster/hadoop3/bin/hdfs namenode -format
/cluster/hadoop3/sbin/start-all.sh
rm -rf /cluster/spark/logs/*
/cluster/spark/sbin/start-all.sh
cd /cluster
rm -rf apache-hive-3.1.3-bin.tar.gz
mv /cluster/hive-3.1.3/packaging/target/apache-hive-3.1.3-bin.tar.gz .
tar -zxvf apache-hive-3.1.3-bin.tar.gz
mv apache-hive-3.1.3-bin hive
rm -rf /cluster/hive/bin/*
rm -rf /cluster/hive/conf/*
cp -r /cluster/conf/* /cluster/hive/conf/
cp -r /cluster/bin/* /cluster/hive/bin/
rm -rf /cluster/hive/logs/*
chown -R root:root /cluster/hive/
chmod 777 -R /cluster/hive/*
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &

======================================================================================================================================================
使用hive/bin/hive
LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE dog; 

add jar HiveSparkHadoopUDF.jar;

CREATE TEMPORARY FUNCTION animal AS 'com.example.cloud.HiveSparkHadoopUDF';

CREATE FUNCTION animal AS 'com.example.cloud.HiveSparkHadoopUDF';

DESCRIBE TEMPORARY FUNCTION animal;

DESCRIBE FUNCTION animal;

SELECT NAME,BIRTHDAY,default.animal(BIRTHDAY) from dog;

删除临时函数

DROP TEMPORARY FUNCTION animal;

DROP FUNCTION animal;
======================================================================================================================================================
hdfs dfs -put /cluster/HiveSparkHadoopUDF.jar /hive/udf/

hdfs dfs -ls /hive/udf/

create function animal2 as 'com.example.cloud.HiveSparkHadoopUDF' using jar 'hdfs://10.10.10.99:9000/hive/udf/HiveSparkHadoopUDF.jar';

desc function extended default.animal2;

select default.animal2('2024-01-20');

SELECT NAME,default.animal2(BIRTHDAY) from dog;

https://codeleading.com/article/76611116901/


/cluster/hive/bin/beeline

!connect jdbc:hive2://10.10.10.99:10000


hive使用spark引擎,构建的时候 不包括-Phive and -Phive-thriftserver 内容(重要性)

insert into dwd_order_info values('20241222901057189205',100.00,'下单成功','李','支付宝','上太湖冰瀑位于北京市房山区韩村河镇的圣水峪村。上太湖清代成村,但“太湖”之名始于辽代,因山间有天然湖泊而得名,而上太湖冰瀑则是京城难得一见的天然冰瀑群!我们骑行到这里,只打卡最近的一处冰瀑。一路爬坡,车少路况好,沿途打卡拍照景点多,是一条不错的骑行线路','202412229010571899999','2024-12-26 00:02:10','2024-12-30 22:02:10','2024-01-26 00:02:10','290212127910232','8b5ab807-cee3-420f-a1aa-189239923f',22,66,257,33,112);
insert into dwd_order_info values('20240221163911090803',258.00,'hive-spark-yarn-hdfs-commit','点外卖','微信支付','','20240221163911090803','2024-02-21 10:02:10','2024-02-21 22:02:10','2024-02-21 16:02:10','3209473820232','903ee3-420f63-a1aa-1856397512f',11,22,33,44,55);

tail -f /cluster/hive/logs/hive.log

======================================================================================================================================================  
CREATE TABLE dwd_order_info (
 `id` STRING COMMENT '订单号',
 `final_amount` DECIMAL(16,2) COMMENT '订单最终金额',
 `order_status` STRING COMMENT '订单状态',
 `user_id` STRING COMMENT '用户 id',
 `payment_way` STRING COMMENT '支付方式',
 `delivery_address` STRING COMMENT '送货地址',
 `out_trade_no` STRING COMMENT '支付流水号',
 `create_time` STRING COMMENT '创建时间',
 `operate_time` STRING COMMENT '操作时间',
 `expire_time` STRING COMMENT '过期时间',
 `tracking_no` STRING COMMENT '物流单编号',
 `province_id` STRING COMMENT '省份 ID',
 `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额',
 `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额',
 `original_amount` DECIMAL(16,2) COMMENT '订单原价金额',
 `feight_fee` DECIMAL(16,2) COMMENT '运费',
 `feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免'
) COMMENT '订单表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';


insert into dwd_order_info values('20231222901057189201',100.00,'下单成功','hello world','微信支付','北京市朝阳区朝外大街101号101大厦','202312229010571899999','2023-12-26 00:02:10','2023-12-30 22:02:10','2024-01-26 00:02:10','290212127910232','8b5ab807-cee3-420f-a1aa-1196a0cab8cf',10,22,99,9.9,1);


======================================================================================================================================================


CREATE TABLE dim_base_province (
 `id` STRING COMMENT '编号',
 `name` STRING COMMENT '省份名称',
 `region_id` STRING COMMENT '地区 ID',
 `area_code` STRING COMMENT '地区编码',
 `iso_code` STRING COMMENT 'ISO-3166 编码,供可视化使用',
 `iso_3166_2` STRING COMMENT 'IOS-3166-2 编码,供可视化使用'
) COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

insert into dim_base_province values('8b5ab807-cee3-420f-a1aa-1196a0cab8cf','黑龙江省牡丹江市东安区六峰湖小区西侧
','0100101010122','5c43dd76-eee1-4aa4-8914-5970deee774d
','9238283823821917','268348539271829034');


======================================================================================================================================================


CREATE TABLE `ads_order_by_province` (
 `dt` STRING COMMENT '统计日期',
 `province_id` STRING COMMENT '省份 id',
 `province_name` STRING COMMENT '省份名称',
 `area_code` STRING COMMENT '地区编码',
 `iso_code` STRING COMMENT '国际标准地区编码',
 `iso_code_3166_2` STRING COMMENT '国际标准地区编码',
 `order_count` BIGINT COMMENT '订单数',
 `order_amount` DECIMAL(16,2) COMMENT '订单金额'
) COMMENT '各省份订单统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';


insert into table ads_order_by_province
select
 '2023-12-26' dt,
 bp.id,
 bp.name,
 bp.area_code,
 bp.iso_code,
 bp.iso_3166_2,
 count(*) order_count,
 sum(oi.final_amount) order_amount
from dwd_order_info oi
left join dim_base_province bp
on oi.province_id=bp.id
group by bp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2;


======================================================================================================================================================  
编译spark3.4.1和3.4.2(任何都不修改)


./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes


./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes


3.4.2 测试学习

/home/spark-3.4.2/dev/make-distribution.sh --name spark-3.4.2 --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.3.6 -Dhadoop.version=3.3.6 -Pkubernetes clean -DskipTests


/home/spark-3.4.2/dev/make-distribution.sh --name spark-3.4.2 --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes clean -DskipTests


install.packages("vignette",repos = NULL)

======================================================================================================================================================
firewall-cmd --permanent --remove-port=135/udp


firewall-cmd --list-ports


firewall-cmd --reload


======================================================================================================================================================
======================================================================================================================================================
dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Dhadoop.version=3.3.6 -Pkubernete clean -DskipTests


dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Pmesos -Pyarn -Phadoop-3.3.6 -Dhadoop.version=3.3.6 -Pkubernetes -DskipTests

./make-distribution.sh --name custom-spark --pip --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes


mvn clean install -DskipTests -Pdist


======================================================================================================================================================
删除或者更新失败的解决方法

Error: Error while compiling statement: FAILED: SemanticException [Error 10297]:
 Attempt to do update or delete on table default.dwd_order_info that is not transactional (state=42000,code=10297)
 
 
 建表
 

======================================================================================================================================================

=====iceberg==============iceberg================iceberg===================iceberg==================iceberg==============================================================================

sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/spark/logs/*
rm -rf /cluster/hive/logs/*
/cluster/hadoop3/sbin/start-all.sh
/cluster/spark/sbin/start-all.sh
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &

======================================================================================================================================================
CREATE TABLE yujia.db.table (id bigint, data string) USING iceberg;


INSERT INTO yujia.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');
#INSERT INTO yujia.db.table SELECT id, data FROM source WHERE length(data) = 1;


MERGE INTO yujia.db.target t USING (SELECT * FROM updates) u ON t.id = u.id
WHEN MATCHED THEN UPDATE SET t.count = t.count + u.count
WHEN NOT MATCHED THEN INSERT *;

CREATE TABLE yujia.nyc.taxis
(
  vendor_id bigint,
  trip_id bigint,
  trip_distance float,
  fare_amount double,
  store_and_fwd_flag string
)
PARTITIONED BY (vendor_id);


INSERT INTO yujia.nyc.taxis
VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');


select * from yujia.nyc.taxis;
 

  • 9
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
ClassNotFoundException: org.apache.flink.runtime.state.CheckpointStorageException 是一个Java异常,表示无法找到指定的类或接口。在这种情况下,类 org.apache.flink.runtime.state.CheckpointStorageException 在程序的 classpath 中无法找到。 Flink 是一个用于分布式流处理和批处理的开源框架。在 Flink 中,CheckpointStorageException 是一个用于处理状态检查点存储问题的异常类。它是 Flink 的状态后端 (state backend) 在存储检查点期间可能发生的异常的基类之一。 当程序在启动或运行时抛出 ClassNotFoundException: org.apache.flink.runtime.state.CheckpointStorageException 异常时,表示 Flink 在查找或加载与检查点存储相关的类时遇到了问题。这可能是由以下原因引起的: 1. 缺少所需的 Flink 依赖项。解决方法是检查程序的 classpath,确保所有必需的 Flink 依赖项都已包含在其中。可以通过在构建工具 (如 Maven 或 Gradle) 的配置文件中添加正确的依赖项来解决此问题。 2. 版本不匹配。如果程序中使用的 Flink 版本与检查点存储实现版本不匹配,可能会导致 ClassNotFoundException。确保使用的 Flink 版本和检查点存储实现的版本兼容。 3. 检查点存储实现不可用。某些情况下,可能需要自定义检查点存储实现。如果没有正确配置或者实现有问题,可能会导致 ClassNotFoundException。确保正确配置检查点存储实现,并确保它可用并可访问。 总之,ClassNotFoundException: org.apache.flink.runtime.state.CheckpointStorageException 是一个表示无法找到 Flink 检查点存储相关类的异常。解决这个问题通常包括检查程序的依赖项和版本,以及确保正确配置和可访问的检查点存储实现。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值