HiveSQL刷题(查询销售件数高于品类平均数的商品)

Platform address:vue-sqleditor

题目需求:

从订单明细表(order_detail)中查询累积销售件数高于其所属品类平均数的商品

期望结果如下:

sku_id<string>    name<string>    sum_num<bigint>    cate_avg_num<bigint>

2    手机壳    6044    1546

5    破壁机    242    194

7    热水壶    252    194

8    微波炉    253    194

12    遮阳伞    20682    5373

需要用到的表:

商品信息表:sku_info

sku_id(商品id)    name(商品名称)    category_id(分类id)    from_date(上架日期)    price(商品价格)

1    xiaomi 10    1    2020-01-01    2000

6    洗碗机    2    2020-02-01    2000

9    自行车    3    2020-01-01    1000

订单明细表:order_detail

order_detail_id(订单明细id)    order_id(订单id)    sku_id(商品id)    create_date(下单日期)    price(商品单价)    sku_num(商品件数)

1    1    1    2021-09-30    2000.00    2

2    1    3    2021-09-30    5000.00    5

22    10    4    2020-10-02    6000.00    1

23    10    5    2020-10-02    500.00    24

24    10    6    2020-10-02    2000.00    5

建表语句:

CREATE TABLE `sku_info`(
      `sku_id` bigint, 
      `name` string, 
      `category_id` bigint, 
      `from_date` string, 
      `price` bigint)
ROW FORMAT DELIMITED  
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT  
  'org.apache.hadoop.mapred.TextInputFormat'  
OUTPUTFORMAT  
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 

CREATE TABLE `order_detail`(
      `order_detail_id` bigint, 
      `order_id` bigint, 
      `sku_id` bigint, 
      `create_date` string, 
      `price` double, 
      `sku_num` bigint)

ROW FORMAT DELIMITED  

  FIELDS TERMINATED BY '\t'

STORED AS INPUTFORMAT  

  'org.apache.hadoop.mapred.TextInputFormat'  

OUTPUTFORMAT  

  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

插入样例数据:

Insert into sku_info 

Select 1,'xiaomi 10',1,'2020-01-01',2000

Union all

Select 6,'洗碗机',2,'2020-02-01',2000

Union all

Select 9,'自行车',3,'2020-01-01',1000;

Insert into category_info

Select 1,'数码'

Union all

Select 2,'厨卫'

Union all

Select 3,'户外';

Insert into order_detail 

Select 1,1,1,'2021-09-30',2000.00,2

Union all

Select 2,1,3,'2021-09-30',5000.00,5

Union all

Select 22,10,4,'2020-10-02',6000.00,1

Union all

Select 23,10,5,'2020-10-02',500.00,24

Union all

Select 24,10,6,'2020-10-02',2000.00,5;

需求分析:

select category_id,s1.sku_id,name,sum(sku_num) as sum_num from order_detail s1 join sku_info s2 on s1.sku_id=s2.sku_id group by category_id,s1.sku_id,name

首先获取到每个商品对应的品类号、名称及销量;

select category_id,sku_id,name,sum_num,avg(sum_num) over(partition by category_id) as cate_avg_num from (select category_id,s1.sku_id,name,sum(sku_num) as sum_num from order_detail s1 join sku_info s2 on s1.sku_id=s2.sku_id group by category_id,s1.sku_id,name)s3 group by category_id,sku_id,name,sum_num

其次对于每个品类开窗求得每个品类的销量均值;

select sku_id,name,sum_num,cast(cate_avg_num as bigint) as cate_avg_num from (select category_id,sku_id,name,sum_num,avg(sum_num) over(partition by category_id) as cate_avg_num from (select category_id,s1.sku_id,name,sum(sku_num) as sum_num from order_detail s1 join sku_info s2 on s1.sku_id=s2.sku_id group by category_id,s1.sku_id,name)s3 group by category_id,sku_id,name,sum_num)rs where sum_num>cate_avg_num

之后取出每个销量大于其品类销量均值的商品列表,并将销量均值由float转为bigint;

最终SQL:

select
  sku_id,
  name,
  sum_num,
  cast(cate_avg_num as bigint) as cate_avg_num
from
  (
    select
      category_id,
      sku_id,
      name,
      sum_num,
      avg(sum_num) over(partition by category_id) as cate_avg_num
    from
      (
        select
          category_id,
          s1.sku_id,
          name,
          sum(sku_num) as sum_num
        from
          order_detail s1
          join sku_info s2 on s1.sku_id = s2.sku_id
        group by
          category_id,
          s1.sku_id,
          name
      ) s3
    group by
      category_id,
      sku_id,
      name,
      sum_num
  ) rs
where
  sum_num > cate_avg_num

AC时间:

2022年12月20日22:29:58

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1.上传tar包 2.解压 tar -zxvf hive-1.2.1.tar.gz 3.安装mysql数据库 推荐yum 在线安装 4.配置hive (a)配置HIVE_HOME环境变量 vi conf/hive-env.sh 配置其中的$hadoop_home (b)配置元数据库信息 vi hive-site.xml 添加如下内容: javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database javax.jdo.option.ConnectionPassword hadoop password to use against metastore database 5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下 如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行) mysql -uroot -p #(执行下面的语句 *.*:所有库下的所有表 %:任何IP地址或主机都可以连接) GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION; FLUSH PRIVILEGES; 6. Jline包版本不一致的问题,需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的 /home/hadoop/app/hadoop-2.6.4/share/hadoop/yarn/lib/jline-0.9.94.jar 启动hive bin/hive ---------------------------------------------------------------------------------------------------- Hive几种使用方式: 1.Hive交互shell bin/hive 2.Hive JDBC服务(参考java jdbc连接mysql) 3.hive启动为一个服务器,来对外提供服务 bin/hiveserver2 nohup bin/hiveserver2 1>/var/log/hiveserver.log 2>/var/log/hiveserver.err & 启动成功后,可以在别的节点上用beeline去连接 bin/beeline -u jdbc:hive2://mini1:10000 -n root 或者 bin/beeline ! connect jdbc:hive2://mini1:10000 4.Hive命令 hive -e ‘sql’ bin/hive -e 'select * from t_test'
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值