HQL经典十题

最新推荐文章于 2023-05-25 09:11:39 发布

午饭有鱼有虾9

最新推荐文章于 2023-05-25 09:11:39 发布

阅读量1.2k

点赞数

分类专栏： SQL 文章标签： hql

本文链接：https://blog.csdn.net/weixin_56765170/article/details/117330541

版权

SQL 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

8.3 手写HQL

8.3.1 手写HQL 第1题

表结构：uid,subject_id,score

求：找出所有科目成绩都大于某一学科平均成绩的学生

数据集如下

1001 01 90

1001 02 90

1001 03 90

1002 01 85

1002 02 85

1002 03 70

1003 01 70

1003 02 70

1003 03 85

1）建表语句

create table score(

uid string,

subject_id string,

score int)

row format delimited fields terminated by '\t';

2）求出每个学科平均成绩

select

uid,

score,

avg(score) over(partition by subject_id) avg_score

from

score;t1

3）根据是否大于平均成绩记录flag，大于则记为0否则记为1

select

uid,

if(score>avg_score,0,1) flag

from

t1;t2

4）根据学生id进行分组统计flag的和，和为0则是所有学科都大于平均成绩

select

uid

from

group by

uid

having

sum(flag)=0;

5）最终SQL

select

uid

from

(select

uid,

if(score>avg_score,0,1) flag

from

(select

uid,

score,

avg(score) over(partition by subject_id) avg_score

from

score)t1)t2

group by

uid

having

sum(flag)=0;

8.3.2 手写HQL 第2题

我们有如下的用户访问数据

userId	visitDate	visitCount
u01	2017/1/21	5
u02	2017/1/23	6
u03	2017/1/22	8
u04	2017/1/20	3
u01	2017/1/23	6
u01	2017/2/21	8
U02	2017/1/23	6
U01	2017/2/22	4

要求使用SQL统计出每个用户的累积访问次数，如下表所示：

用户id	月份	小计	累积
u01	2017-01	11	11
u01	2017-02	12	23
u02	2017-01	12	12
u03	2017-01	8	8
u04	2017-01	3	3

数据集

u01 2017/1/21 5

u02 2017/1/23 6

u03 2017/1/22 8

u04 2017/1/20 3

u01 2017/1/23 6

u01 2017/2/21 8

u02 2017/1/23 6

u01 2017/2/22 4

1）创建表

create table action

(userId string,

visitDate string,

visitCount int)

row format delimited fields terminated by "\t";

2）修改数据格式

select

userId,

date_format(regexp_replace(visitDate,'/','-'),'yyyy-MM') mn,

visitCount

from

action;t1

3）计算每人单月访问量

select

userId,

mn,

sum(visitCount) mn_count

from

group by

userId,mn;t2

4）按月累计访问量

select

userId,

mn,

mn_count,

sum(mn_count) over(partition by userId order by mn)

from t2;

5）最终SQL

select

userId,

mn,

mn_count,

sum(mn_count) over(partition by userId order by mn)

from

( select

userId,

mn,

sum(visitCount) mn_count

from

(select

userId,

date_format(regexp_replace(visitDate,'/','-'),'yyyy-MM') mn,

visitCount

from

action)t1

group by userId,mn)t2;

8.3.3 手写HQL 第3题

有50W个京东店铺，每个顾客访客访问任何一个店铺的任何一个商品时都会产生一条访问日志，访问日志存储的表名为Visit，访客的用户id为user_id，被访问的店铺名称为shop，请统计：

1）每个店铺的UV（访客数）

2）每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数

数据集

u1 a

u2 b

u1 b

u1 a

u3 c

u4 b

u1 a

u2 c

u5 b

u4 b

u6 c

u2 c

u1 b

u2 a

u3 a

u5 a

1）建表

create table visit(user_id string,shop string) row format delimited fields terminated by '\t';

2）每个店铺的UV（访客数）

select shop,count(distinct user_id) from visit group by shop;

3）每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数

（1）查询每个店铺被每个用户访问次数

select shop,user_id,count(*) ct

from visit

group by shop,user_id;t1

（2）计算每个店铺被用户访问次数排名

select shop,user_id,ct,rank() over(partition by shop order by ct) rk

from t1;t2

（3）取每个店铺排名前3的

select shop,user_id,ct

from t2

where rk<=3;

（4）最终SQL

select

shop,

user_id,

from

(select

shop,

user_id,

ct,

rank() over(partition by shop order by ct) rk

from

(select

shop,

user_id,

count(*) ct

from visit

group by

shop,

user_id)t1

)t2

where rk<=3;

8.3.4 手写HQL 第4题

已知一个表STG.ORDER，有如下字段:Date，Order_id，User_id，amount。请给出sql进行统计:数据样例:2017-01-01,10029028,1000003251,33.57。

1）给出 2017年每个月的订单数、用户数、总成交金额。

2）给出2017年11月的新客数(指在11月才有第一笔订单)

建表

create table order_tab(dt string,order_id string,user_id string,amount decimal(10,2)) row format delimited fields terminated by '\t';

1）给出 2017年每个月的订单数、用户数、总成交金额。

select

date_format(dt,'yyyy-MM'),

count(order_id),

count(distinct user_id),

sum(amount)

from

order_tab

where

date_format(dt,'yyyy')='2017'

group by

date_format(dt,'yyyy-MM');

2）给出2017年11月的新客数(指在11月才有第一笔订单)

select

count(user_id)

from

order_tab

group by

user_id

having

date_format(min(dt),'yyyy-MM')='2017-11';

8.3.5 手写HQL 第5题

有日志如下，请写出代码求得所有用户和活跃用户的总数及平均年龄。（活跃用户指连续两天都有访问记录的用户）日期用户年龄

数据集

2019-02-11,test_1,23

2019-02-11,test_2,19

2019-02-11,test_3,39

2019-02-11,test_1,23

2019-02-11,test_3,39

2019-02-11,test_1,23

2019-02-12,test_2,19

2019-02-13,test_1,23

2019-02-15,test_2,19

2019-02-16,test_2,19

1）建表

create table user_age(dt string,user_id string,age int)row format delimited fields terminated by ',';

2）按照日期以及用户分组，按照日期排序并给出排名

select

dt,

user_id,

min(age) age,

rank() over(partition by user_id order by dt) rk

from

user_age

group by

dt,user_id;t1

3）计算日期及排名的差值

select

user_id,

age,

date_sub(dt,rk) flag

from

t1;t2

4）过滤出差值大于等于2的，即为连续两天活跃的用户

select

user_id,

min(age) age

from

group by

user_id,flag

having

count(*)>=2;t3

5）对数据进行去重处理（一个用户可以在两个不同的时间点连续登录），例如：a用户在1月10号1月11号以及1月20号和1月21号4天登录。

select

user_id,

min(age) age

from

group by

user_id;t4

6）计算活跃用户（两天连续有访问）的人数以及平均年龄

select

count(*) ct,

cast(sum(age)/count(*) as decimal(10,2))

from t4;

7）对全量数据集进行按照用户去重

select

user_id,

min(age) age

from

user_age

group by

user_id;t5

8）计算所有用户的数量以及平均年龄

select

count(*) user_count,

cast((sum(age)/count(*)) as decimal(10,1))

from

t5;

9）将第5步以及第7步两个数据集进行union all操作

select

0 user_total_count,

0 user_total_avg_age,

count(*) twice_count,

cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age

from

(

select

user_id,

min(age) age

from

(select

user_id,

min(age) age

from

(

select

user_id,

age,

date_sub(dt,rk) flag

from

(

select

dt,

user_id,

min(age) age,

rank() over(partition by user_id order by dt) rk

from

user_age

group by

dt,user_id

)t1

)t2

group by

user_id,flag

having

count(*)>=2)t3

group by

user_id

)t4

union all

select

count(*) user_total_count,

cast((sum(age)/count(*)) as decimal(10,1)),

0 twice_count,

0 twice_count_avg_age

from

(

select

user_id,

min(age) age

from

user_age

group by

user_id

)t5;t6

10）求和并拼接为最终SQL

select

sum(user_total_count),

sum(user_total_avg_age),

sum(twice_count),

sum(twice_count_avg_age)

from

(select

0 user_total_count,

0 user_total_avg_age,

count(*) twice_count,

cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age

from

(

select

user_id,

min(age) age

from

(select

user_id,

min(age) age

from

(

select

user_id,

age,

date_sub(dt,rk) flag

from

(

select

dt,

user_id,

min(age) age,

rank() over(partition by user_id order by dt) rk

from

user_age

group by

dt,user_id

)t1

)t2

group by

user_id,flag

having

count(*)>=2)t3

group by

user_id

)t4

union all

select

count(*) user_total_count,

cast((sum(age)/count(*)) as decimal(10,1)),

0 twice_count,

0 twice_count_avg_age

from

(

select

user_id,

min(age) age

from

user_age

group by

user_id

)t5)t6;

8.3.6 手写HQL 第6题

请用sql写出所有用户中在今年10月份第一次购买商品的金额，表ordertable字段（购买用户：userid，金额：money，购买时间：paymenttime(格式：2017-10-01)，订单id：orderid）

1）建表

create table ordertable(

userid string,

money int,

paymenttime string,

orderid string)

row format delimited fields terminated by '\t';

2）查询出

select

userid,

min(paymenttime) paymenttime

from

ordertable

where

date_format(paymenttime,'yyyy-MM')='2017-10'

group by

userid;t1

select

t1.userid,

t1.paymenttime,

od.money

from

join

ordertable od

t1.userid=od.userid

and

t1.paymenttime=od.paymenttime;

select

t1.userid,

t1.paymenttime,

od.money

from

(select

userid,

min(paymenttime) paymenttime

from

ordertable

where

date_format(paymenttime,'yyyy-MM')='2017-10'

group by

userid)t1

join

ordertable od

t1.userid=od.userid

and

t1.paymenttime=od.paymenttime;

8.3.7 手写HQL 第7题

有一个线上服务器访问日志格式如下（用sql答题）

时间接口 ip地址

2016-11-09 11：22：05 /api/user/login 110.23.5.33

2016-11-09 11：23：10 /api/user/detail 57.3.2.16

.....

2016-11-09 23：59：40 /api/user/login 200.6.5.166

求11月9号下午14点（14-15点），访问api/user/login接口的top10的ip地址

数据集

2016-11-09 14:22:05 /api/user/login 110.23.5.33

2016-11-09 11:23:10 /api/user/detail 57.3.2.16

2016-11-09 14:59:40 /api/user/login 200.6.5.166

2016-11-09 14:22:05 /api/user/login 110.23.5.34

2016-11-09 11:23:10 /api/user/detail 57.3.2.16

2016-11-09 23:59:40 /api/user/login 200.6.5.166

2016-11-09 14:22:05 /api/user/login 110.23.5.34

2016-11-09 11:23:10 /api/user/detail 57.3.2.16

2016-11-09 23:59:40 /api/user/login 200.6.5.166

2016-11-09 14:22:05 /api/user/login 110.23.5.35

2016-11-09 14:23:10 /api/user/detail 57.3.2.16

2016-11-09 23:59:40 /api/user/login 200.6.5.166

2016-11-09 14:59:40 /api/user/login 200.6.5.166

1）建表

create table ip(

time string,

interface string,

ip string)

row format delimited fields terminated by '\t';

2）最终SQL

select

ip,

interface,

count(*) ct

from

where

date_format(time,'yyyy-MM-dd HH')>='2016-11-09 14'

and

date_format(time,'yyyy-MM-dd HH')<='2016-11-09 15'

and

interface='/api/user/login'

group by

ip,interface

order by

ct desc

limit 2;t1

8.3.8 手写SQL 第8题

有一个账号表如下，请写出SQL语句，查询各自区组的money排名前十的账号（分组取前10）

1）建表（MySQL）

CREATE TABLE `account`

( `dist_id` int（11）DEFAULT NULL COMMENT '区组id',

`account` varchar（100）DEFAULT NULL COMMENT '账号',

`gold` int（11）DEFAULT 0 COMMENT '金币'）;

2）最终SQL

select

from

account as a

where

(select

count(distinct(a1.gold))

from

account as a1

where

a1.dist_id=a.dist_id

and

a1.gold>a.gold)<3;

8.3.9 手写HQL 第9题

1）有三张表分别为会员表（member）销售表（sale）退货表（regoods）

（1）会员表有字段memberid（会员id，主键）credits（积分）；

（2）销售表有字段memberid（会员id，外键）购买金额（MNAccount）；

（3）退货表中有字段memberid（会员id，外键）退货金额（RMNAccount）。

2）业务说明

（1）销售表中的销售记录可以是会员购买，也可以是非会员购买。（即销售表中的memberid可以为空）；

（2）销售表中的一个会员可以有多条购买记录；

（3）退货表中的退货记录可以是会员，也可是非会员；

（4）一个会员可以有一条或多条退货记录。

查询需求：分组查出销售表中所有会员购买金额，同时分组查出退货表中所有会员的退货金额，把会员id相同的购买金额-退款金额得到的结果更新到表会员表中对应会员的积分字段（credits）

数据集

sale

1001 50.3

1002 56.5

1003 235

1001 23.6

1005 56.2

25.6

33.5

regoods

1001 20.1

1002 23.6

1001 10.1

23.5

10.2

1005 0.8

1）建表

create table member(memberid string,credits double) row format delimited fields terminated by '\t';

create table sale(memberid string,MNAccount double) row format delimited fields terminated by '\t';

create table regoods(memberid string,RMNAccount double) row format delimited fields terminated by '\t';

2）最终SQL

insert into table member

select

t1.memberid,

MNAccount-RMNAccount

from

(select

memberid,

sum(MNAccount) MNAccount

from

sale

where

memberid!=''

group by

memberid

)t1

join

(select

memberid,

sum(RMNAccount) RMNAccount

from

regoods

where

memberid!=''

group by

memberid

)t2

t1.memberid=t2.memberid;

8.3.10 手写HQL 第10题

1.用一条SQL语句查询出每门课都大于80分的学生姓名

name kecheng fenshu

张三语文 81

张三数学 75

李四语文 76

李四数学 90

王五语文 81

王五数学 100

王五英语 90

A: select distinct name from table where name not in (select distinct name from table where fenshu<=80)

B：select name from table group by name having min(fenshu)>80

2. 学生表如下:
自动编号   学号  姓名课程编号课程名称分数
1     2005001 张三   0001   数学   69
2     2005002 李四 0001   数学   89
3     2005001 张三   0001   数学   69
删除除了自动编号不同, 其他都相同的学生冗余信息

A: delete tablename where 自动编号 not in(select min(自动编号) from tablename group by学号, 姓名, 课程编号, 课程名称, 分数)

3.一个叫team的表，里面只有一个字段name,一共有4条纪录，分别是a,b,c,d,对应四个球队，现在四个球队进行比赛，用一条sql语句显示所有可能的比赛组合.

答：select a.name, b.name
from team a, team b
where a.name < b.name

4.面试题：怎么把这样一个
year   month amount
1991   1     1.1
1991   2     1.2
1991   3     1.3
1991   4     1.4
1992   1     2.1
1992   2     2.2
1992   3     2.3
1992   4     2.4
查成这样一个结果
year m1 m2 m3 m4
1991 1.1 1.2 1.3 1.4
1992 2.1 2.2 2.3 2.4

答案
select year,
(select amount from aaa m where month=1 and m.year=aaa.year) as m1,
(select amount from aaa m where month=2 and m.year=aaa.year) as m2,
(select amount from aaa m where month=3 and m.year=aaa.year) as m3,
(select amount from aaa m where month=4 and m.year=aaa.year) as m4
from aaa group by year

*********************************************************************
5.说明：复制表(只复制结构,源表名：a新表名：b)

SQL: select * into b from a where 1<>1 (where1=1，拷贝表结构和数据内容)
ORACLE:create table b

Select * from a where 1=2

[<>（不等于）(SQL Server Compact)

比较两个表达式。当使用此运算符比较非空表达式时，如果左操作数不等于右操作数，则结果为 TRUE。否则，结果为 FALSE。]

原表:
courseid coursename score
-------------------------------------
1 java 70
2 oracle 90
3 xml 40
4 jsp 30
5 servlet 80
-------------------------------------
为了便于阅读,查询此表后的结果显式如下(及格分数为60):
courseid coursename score mark
---------------------------------------------------
1 java 70 pass
2 oracle 90 pass
3 xml 40 fail
4 jsp 30 fail
5 servlet 80 pass
---------------------------------------------------
写出此查询语句
select courseid, coursename ,score ,if(score>=60, "pass","fail") as mark from course

7.表名：购物信息

购物人商品名称数量

A 甲 2

B 乙 4

C 丙 1

A 丁 2

B 丙 5

……

给出所有购入商品为两种或两种以上的购物人记录

答：select * from 购物信息 where 购物人 in (select 购物人 from 购物信息 group by 购物人 having count(*) >= 2);

info 表

date result

2005-05-09 win

2005-05-09 lose

2005-05-10 win

2005-05-10 lose

如果要生成下列结果, 该如何写sql语句?

　　 win lose

2005-05-09 2 2

2005-05-10 1 2

答案：

(1) select date, sum(case when result = "win" then 1 else 0 end) as "win", sum(case when result = "lose" then 1 else 0 end) as "lose" from info group by date;

(2) select a.date, a.result as win, b.result as lose

　　from

　　(select date, count(result) as result from info where result = "win" group by date) as a

　　join

　　(select date, count(result) as result from info where result = "lose" group by date) as b

on a.date = b.date;

8.3.11 手写HQL 第11题

有一个订单表order。已知字段有：order_id(订单ID), user_id(用户ID),amount(金额), pay_datetime(付费时间),channel_id(渠道ID),dt(分区字段)。

1. 在Hive中创建这个表。

2. 查询dt=‘2018-09-01‘里每个渠道的订单数，下单人数（去重），总金额。

3. 查询dt=‘2018-09-01‘里每个渠道的金额最大3笔订单。

4. 有一天发现订单数据重复，请分析原因