hive sql 报错后继续执行_使用pyspark执行hive sql

最新推荐文章于 2023-04-02 16:08:56 发布

不妧

最新推荐文章于 2023-04-02 16:08:56 发布

阅读量277

点赞数

文章标签： hive sql 报错后继续执行

本文链接：https://blog.csdn.net/weixin_42635064/article/details/112889581

版权

文章目录

配置环境

环境配置这里就不再多讲，只研究执行效率的对比

spark

hadoop

执行模式

假设一个查询host出数量的sql是这样：

select host,count(distinct c.mobile) as mobile_num from xml.my_goods d

right join ( select b.xmsec as mobile from(

select mobile_id from xll.xf_shenzhen where dt = '2018-08-31') a

left join zww.nami b on a.mobile_id = b.mobile_id

where b.money is not null ) c on upper(d.mobile) = upper(c.mobile)

where dt >= '20180827' and c.mobile is not null

group by host

hive模式

直接把上面的sql放到hue的hive工作台中执行即可

pyspark模式

共三个文件：

run.sh：执行文件，内容是一个执行py脚本的命令

spark2-submit --master local[*] spark_test.py

spark_test.py：pyspark脚本，作用是执行sql，并把结果保存到hive上

import datetime

import sql_

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注