hive使用的是3.13版本,spark是3.3.3支持hadoop3.x
hive将engine从mr改成spark,通过beeline执行insert、delete时一直报错,sparkTask rpc关闭,
查看yarn是出现ClassNotFoundException: org.apache.hive.spark.client.Job。
开始以为是版本不兼容问题。spark3.3.3编译版本对应的hive是2.3.9
切换hive2.3.9依然是org.apache.hive.spark.client.Job找不到问题。
spark的jars目录存在hive-exec-core-2.3.9.jar,反复对比hive的lib目录,发现是hive-exec.jar没有core,将spark3.3.3下hive的hive-exec-core-2.3.9.jar删掉,替换成hive下的lib文件,重启spark和hive,这个错误消失了
重新执行命令,又出现找不到其他类的情况,可能是spark官方发布的hive相关包和hive发布版的有差别把。什么原因,不知道,先替换试试。替换了还不太行,引发其他问题,自己编译吧
End of LogType:launch_container.sh.This log file belongs to a running container (container_1701103942975_0002_01_000001) and so may not be complete.
环境:hive查询数据,yarn查看日志出现上面的错误
yarn application -list
yarn logs -applicationId application_1701103942975_0002
到hadoop的目录查询运行日志
修改部分hive代码,返回编译
修改编译的hive3.1.3源码
turboic/hive-3.1.3 (github.com)
mvn clean package -DskipTests -Pdist
mvn install:install-file -Dfile=mysql-connector-j-8.0.33.jar -DgroupId=com.mysql -DartifactId=mysql-connector-j -Dversion=8.0.33 -Dpackaging=jar
修改编译的Hadoop3.3.6源码
turboic/hadoop-3.3.6 (github.com)
mvn clean install -DskipTests -Pdist
修改编译的Spark3.4.1源码
turboic/spark-3.4.1 (github.com)
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes
并且guava-27.0-jre.jar要改一下。
mvn clean install -DskipTests -Pdist
======================================================================================================================================================
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/hadoop3/data/*
rm -rf /tmp/*
/cluster/hadoop3/bin/hdfs namenode -format
/cluster/hadoop3/sbin/start-all.sh
tar -zxvf /cluster/spark-3.4.1-bin-custom-spark.tgz
rm -rf /cluster/spark
mv /cluster/spark-3.4.1-bin-custom-spark /cluster/spark
rm -rf /cluster/spark/conf/*
cp -r /cluster/bak-spark/conf/* /cluster/spark/conf
/cluster/spark/sbin/start-all.sh
tar -zxvf /cluster/apache-hive-3.1.3-bin.tar.gz
rm -rf /cluster/hive
mv /cluster/apache-hive-3.1.3-bin /cluster/hive
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hive/conf/*
rm -rf /cluster/hive/bin/*
rm -rf /cluster/hive/logs/*
cp -r /cluster/bak-hive/conf/* /cluster/hive/conf
cp -r /cluster/bak-hive/bin/* /cluster/hive/bin
rm -rf /cluster/hive/lib/hbase*.jar
chown -R root:root /cluster/hive/
chmod 777 /cluster/hive
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &
======================================================================================================================================================
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/hadoop3/data/*
rm -rf /tmp/*
/cluster/hadoop3/bin/hdfs namenode -format
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/spark/logs/*
rm -rf /cluster/hive/logs/*
/cluster/hadoop3/sbin/start-all.sh
/cluster/spark/sbin/start-all.sh
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &
======================================================================================================================================================
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
#rm -rf /cluster/hadoop3/data/*
rm -rf /tmp/*
#/cluster/hadoop3/bin/hdfs namenode -format
/cluster/hadoop3/sbin/start-all.sh
rm -rf /cluster/spark/logs/*
/cluster/spark/sbin/start-all.sh
cd /cluster
rm -rf apache-hive-3.1.3-bin.tar.gz
mv /cluster/hive-3.1.3/packaging/target/apache-hive-3.1.3-bin.tar.gz .
tar -zxvf apache-hive-3.1.3-bin.tar.gz
mv apache-hive-3.1.3-bin hive
rm -rf /cluster/hive/bin/*
rm -rf /cluster/hive/conf/*
cp -r /cluster/conf/* /cluster/hive/conf/
cp -r /cluster/bin/* /cluster/hive/bin/
rm -rf /cluster/hive/logs/*
chown -R root:root /cluster/hive/
chmod 777 -R /cluster/hive/*
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &======================================================================================================================================================
使用hive/bin/hive
LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE dog;add jar HiveSparkHadoopUDF.jar;
CREATE TEMPORARY FUNCTION animal AS 'com.example.cloud.HiveSparkHadoopUDF';
CREATE FUNCTION animal AS 'com.example.cloud.HiveSparkHadoopUDF';
DESCRIBE TEMPORARY FUNCTION animal;
DESCRIBE FUNCTION animal;
SELECT NAME,BIRTHDAY,default.animal(BIRTHDAY) from dog;
删除临时函数
DROP TEMPORARY FUNCTION animal;
DROP FUNCTION animal;
======================================================================================================================================================
hdfs dfs -put /cluster/HiveSparkHadoopUDF.jar /hive/udf/hdfs dfs -ls /hive/udf/
create function animal2 as 'com.example.cloud.HiveSparkHadoopUDF' using jar 'hdfs://10.10.10.99:9000/hive/udf/HiveSparkHadoopUDF.jar';
desc function extended default.animal2;
select default.animal2('2024-01-20');
SELECT NAME,default.animal2(BIRTHDAY) from dog;
https://codeleading.com/article/76611116901/
/cluster/hive/bin/beeline!connect jdbc:hive2://10.10.10.99:10000
hive使用spark引擎,构建的时候 不包括-Phive and -Phive-thriftserver 内容(重要性)insert into dwd_order_info values('20241222901057189205',100.00,'下单成功','李','支付宝','上太湖冰瀑位于北京市房山区韩村河镇的圣水峪村。上太湖清代成村,但“太湖”之名始于辽代,因山间有天然湖泊而得名,而上太湖冰瀑则是京城难得一见的天然冰瀑群!我们骑行到这里,只打卡最近的一处冰瀑。一路爬坡,车少路况好,沿途打卡拍照景点多,是一条不错的骑行线路','202412229010571899999','2024-12-26 00:02:10','2024-12-30 22:02:10','2024-01-26 00:02:10','290212127910232','8b5ab807-cee3-420f-a1aa-189239923f',22,66,257,33,112);
insert into dwd_order_info values('20240221163911090803',258.00,'hive-spark-yarn-hdfs-commit','点外卖','微信支付','','20240221163911090803','2024-02-21 10:02:10','2024-02-21 22:02:10','2024-02-21 16:02:10','3209473820232','903ee3-420f63-a1aa-1856397512f',11,22,33,44,55);tail -f /cluster/hive/logs/hive.log
======================================================================================================================================================
CREATE TABLE dwd_order_info (
`id` STRING COMMENT '订单号',
`final_amount` DECIMAL(16,2) COMMENT '订单最终金额',
`order_status` STRING COMMENT '订单状态',
`user_id` STRING COMMENT '用户 id',
`payment_way` STRING COMMENT '支付方式',
`delivery_address` STRING COMMENT '送货地址',
`out_trade_no` STRING COMMENT '支付流水号',
`create_time` STRING COMMENT '创建时间',
`operate_time` STRING COMMENT '操作时间',
`expire_time` STRING COMMENT '过期时间',
`tracking_no` STRING COMMENT '物流单编号',
`province_id` STRING COMMENT '省份 ID',
`activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额',
`coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额',
`original_amount` DECIMAL(16,2) COMMENT '订单原价金额',
`feight_fee` DECIMAL(16,2) COMMENT '运费',
`feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免'
) COMMENT '订单表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
insert into dwd_order_info values('20231222901057189201',100.00,'下单成功','hello world','微信支付','北京市朝阳区朝外大街101号101大厦','202312229010571899999','2023-12-26 00:02:10','2023-12-30 22:02:10','2024-01-26 00:02:10','290212127910232','8b5ab807-cee3-420f-a1aa-1196a0cab8cf',10,22,99,9.9,1);
======================================================================================================================================================
CREATE TABLE dim_base_province (
`id` STRING COMMENT '编号',
`name` STRING COMMENT '省份名称',
`region_id` STRING COMMENT '地区 ID',
`area_code` STRING COMMENT '地区编码',
`iso_code` STRING COMMENT 'ISO-3166 编码,供可视化使用',
`iso_3166_2` STRING COMMENT 'IOS-3166-2 编码,供可视化使用'
) COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';insert into dim_base_province values('8b5ab807-cee3-420f-a1aa-1196a0cab8cf','黑龙江省牡丹江市东安区六峰湖小区西侧
','0100101010122','5c43dd76-eee1-4aa4-8914-5970deee774d
','9238283823821917','268348539271829034');
======================================================================================================================================================
CREATE TABLE `ads_order_by_province` (
`dt` STRING COMMENT '统计日期',
`province_id` STRING COMMENT '省份 id',
`province_name` STRING COMMENT '省份名称',
`area_code` STRING COMMENT '地区编码',
`iso_code` STRING COMMENT '国际标准地区编码',
`iso_code_3166_2` STRING COMMENT '国际标准地区编码',
`order_count` BIGINT COMMENT '订单数',
`order_amount` DECIMAL(16,2) COMMENT '订单金额'
) COMMENT '各省份订单统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
insert into table ads_order_by_province
select
'2023-12-26' dt,
bp.id,
bp.name,
bp.area_code,
bp.iso_code,
bp.iso_3166_2,
count(*) order_count,
sum(oi.final_amount) order_amount
from dwd_order_info oi
left join dim_base_province bp
on oi.province_id=bp.id
group by bp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2;
======================================================================================================================================================
编译spark3.4.1和3.4.2(任何都不修改)
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes
3.4.2 测试学习/home/spark-3.4.2/dev/make-distribution.sh --name spark-3.4.2 --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.3.6 -Dhadoop.version=3.3.6 -Pkubernetes clean -DskipTests
/home/spark-3.4.2/dev/make-distribution.sh --name spark-3.4.2 --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes clean -DskipTests
install.packages("vignette",repos = NULL)======================================================================================================================================================
firewall-cmd --permanent --remove-port=135/udp
firewall-cmd --list-ports
firewall-cmd --reload
======================================================================================================================================================
======================================================================================================================================================
dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Dhadoop.version=3.3.6 -Pkubernete clean -DskipTests
dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Pmesos -Pyarn -Phadoop-3.3.6 -Dhadoop.version=3.3.6 -Pkubernetes -DskipTests./make-distribution.sh --name custom-spark --pip --tgz -Psparkr -Pmesos -Pyarn -Pkubernetes
mvn clean install -DskipTests -Pdist
======================================================================================================================================================
删除或者更新失败的解决方法Error: Error while compiling statement: FAILED: SemanticException [Error 10297]:
Attempt to do update or delete on table default.dwd_order_info that is not transactional (state=42000,code=10297)
建表
======================================================================================================================================================
=====iceberg==============iceberg================iceberg===================iceberg==================iceberg==============================================================================
sudo kill -9 $(ps -ef|grep hadoop|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep spark|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
sudo kill -9 $(ps -ef|grep hive|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
rm -rf /cluster/hadoop3/logs/*
rm -rf /cluster/spark/logs/*
rm -rf /cluster/hive/logs/*
/cluster/hadoop3/sbin/start-all.sh
/cluster/spark/sbin/start-all.sh
nohup /cluster/hive/bin/hivemetastore2 > 2a.log &
nohup /cluster/hive/bin/hiveserver2 > 2b.log &======================================================================================================================================================
CREATE TABLE yujia.db.table (id bigint, data string) USING iceberg;
INSERT INTO yujia.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');
#INSERT INTO yujia.db.table SELECT id, data FROM source WHERE length(data) = 1;
MERGE INTO yujia.db.target t USING (SELECT * FROM updates) u ON t.id = u.id
WHEN MATCHED THEN UPDATE SET t.count = t.count + u.count
WHEN NOT MATCHED THEN INSERT *;CREATE TABLE yujia.nyc.taxis
(
vendor_id bigint,
trip_id bigint,
trip_distance float,
fare_amount double,
store_and_fwd_flag string
)
PARTITIONED BY (vendor_id);
INSERT INTO yujia.nyc.taxis
VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');
select * from yujia.nyc.taxis;