Hive场景题

最新推荐文章于 2023-04-05 19:27:06 发布

置顶如隔三秋_

最新推荐文章于 2023-04-05 19:27:06 发布

阅读量2.8k

点赞数 9

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/PIPJIN961111/article/details/102598747

版权

hive 专栏收录该内容

1 篇文章 1 订阅

订阅专栏

title: Hive场景题练习
date: 2019-10-15 23:21:44

更新中ing…

Hive场景题

1.1 hive窗口函数

1.1.4 编写连续7天登录的总人数

数据: t1表 
Uid dt login_status(1登录成功,0异常) 
1 2019-07-11 1 
1 2019-07-12 1 
1 2019-07-13 1 
1 2019-07-14 1 
1 2019-07-15 1 
1 2019-07-16 1 
1 2019-07-17 1 
1 2019-07-18 1 
2 2019-07-11 1 
2 2019-07-12 1 
2 2019-07-13 0 
2 2019-07-14 1 
2 2019-07-15 1 
2 2019-07-16 0 
2 2019-07-17 1 
2 2019-07-18 0 
3 2019-07-11 1 
3 2019-07-12 1 
3 2019-07-13 1 
3 2019-07-14 1 
3 2019-07-15 1 
3 2019-07-16 1 
3 2019-07-17 1 
3 2019-07-18 1

思路：应用了dateiff()函数两个日期相减

select dateiff(dt,lag(dt,6) over(distribute by uid sort by dt)) from t1

求用户有没有七天以上

然后 select dateiff(dt,lag(dt,6) over(distribute by uid sort by dt)) from t1 where login_status=1

添加条件用户在七天里的状态都必须为1

最好count一下注意要distinct uid 条件是登录的天数为6 也就是 dateiff（）函数 =6

-- 创表
create table if not exists t1(
uid int,
dt string,
login_status int
)
row format delimited fields terminated by ' '
;
--加载数据
load data local inpath "./t1.txt" into table t1;
--语句实现
select count(distinct a.uid)
from
(select uid,login_status,
datediff(dt,lag(dt,6) over(distribute by uid sort by dt asc)) b
from t1
where login_status=1) a
where a.b=6
;
-- 结果为 2

1.1.6 编写 sql 句实现每班前三名，分数一样并列，同时求出前三名

按名次排序的一次的分差：

数据： stu表

Stu_no class score 

1 1901 90 
2 1901 90 
3 1901 83 
4 1901 60 
5 1902 66 
6 1902 23 
7 1902 99 
8 1902 67 
9 1902 87

编写sql实现，结果如下：

结果数据：

班级 stu_no score rn rn1 rn_diff 

1901 1 90 1 1 90 
1901 2 90 2 1 0 
1901 3 83 3 1 -7 
1902 7 99 1 1 99 
1902 9 87 2 2 -12 
1902 8 67 3 3 -20

思路：

-- 创表
create table if not exists stu(
Stu_no int,
class int,
score int
)
row format delimited fields terminated by ' '
;
-- 加载数据
load data local inpath "./stu.txt" into table stu;
-- 语句实现
select a.class as `班级`,a.stu_no,a.score,a.rn,
score-nvl(lag(score) over (distribute by class sort by rn),0) as rn_diff
from
(select class,stu_no,score,
row_number() over(distribute by class sort by score desc) as rn
from stu) a
where a.rn<4
;

1.1.7 每个店铺的当月销售额和累计到当月的总销售额

数据：

店铺,月份,金额 
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250

编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额?

-- 创表
create table if not exists store(
store_name string,
month string,
money int
)
row format delimited fields terminated by ','
;
-- 加载数据
load data local inpath "./store.txt" into table store;
-- 语句实现
select b.store_name,b.month,b.a as `当月销售额`,
sum(b.a) over(distribute by b.store_name sort by b.month) as `累计到当月的总销售额`
from
(select store_name,month,sum(money) as a
from store
group by store_name,month) b
;

1.1.9 订单及订单类型行列互换

t1表: 
order_id order_type order_time 
111 N 10:00 
111 A 10:05 
111 B 10:10 

是用hql获取结果如下： 
order_id order_type_1 order_type_2 order_time_1 order_time_2 
111 N A 10:00 10:05 
111 A B 10:05 10:10

-- 创表
create table if not exists order1(
order_id int,
order_type string,
order_time string
)
row format delimited fields terminated by ' '
;
-- 加载数据
load data local inpath "./hivedata/order.txt" into table order1;
-- 语句实现
select * 
from
(select order_id,
order_type order_type_1,
lead(order_type) over(distribute by order_id sort by order_time asc) order_type_2,
order_time order_time_1,
lead(order_time) over(distribute by order_id sort by order_time asc) order_time_2
from order1) a
where a.order_type_2 is not null
;

1.1.10 某APP每天访问数据存放在表access_log里面

包含日期字段ds,用户类型字段user_type，用户账号user_id,用户访问时间log_time,请使用hive的hql语句实现如下需求：

(1)、每天整体的访问UV、PV?
(2)、每天每个类型的访问UV、PV? 
(3)、每天每个类型中最早访问时间和最晚访问时间? 
(4)、每天每个类型中访问次数最高的10个用户?

-- PV（Page View）访问量, 即页面浏览量或点击量，衡量网站用户访问的网页数量
-- UV（Unique Visitor）独立访客，统计1天内访问某站点的用户数(以cookie为依据)
-- IP（Internet Protocol）独立IP数，是指1天内多少个独立的IP浏览了页面，即统计不同的IP浏览用户数量
-- 创表
create table if not exists access_log(
user_type string,
user_id int,
log_time string,
ds timestamp
)
row format delimited fields terminated by ','
;
(1)
select user_type,count(1) pv,count(distinct user_id) uv
from access_log
group by ds
;
(2)
select count(1) pv,count(distinct user_id) uv
from access_long
group by ds,user_type
;
(3)
select 
first_value(log_time) over(distribute by user_type sort by ds)
last_value(log_time) over(distribute by user_type sort by ds)
from access_log
;
(4)
select row_number(distribute by user_type,ds sort by)

1.1.11 每个用户连续登陆的最大天数？

数据: 
login表 
uid,date 
1,2019-08-01 
1,2019-08-02 
1,2019-08-03 
2,2019-08-01 
2,2019-08-02 
3,2019-08-01 
3,2019-08-03 
4,2019-07-28 
4,2019-07-29 
4,2019-08-01 
4,2019-08-02 
4,2019-08-03
结果如下： 
uid cnt_days 
1 3 
2 2 
3 1 
4 3

思路从时间下手使用data_sub(dt,2) 第一个参数是时间第二个参数是数字时间-数字求出差

要判断是否连续登录再使用一个row_number 按照时间排序放在第二个参数上

例如 2019-08-01 -1 =2019-07-31 2019-08-02 -2 =2019-07-31 结果相同相同的count一下就证明连续几天了

再例如 2019-08-01 -1 =2019-07-31 2019-08-03-2=2019-08-01 结果不同

最后count要取最大的因为同一个用户可能会连续登录3天中间断一天再连续登录4天取最大的四天

-- 创表
create table if not exists login(
uid int,
dt string
)
row format delimited fields terminated by ','
;
-- 加载数据
load data local inpath "./hivedata/login.txt" into table login;
-- 语句实现
select uid,max(cn) from 
(select uid,count(*) cn from 
(select uid,date_sub(dt,row_number() over(distribute by uid sort by dt)) dts 
from login) a
group by uid,dts) b
group by uid
;

1.1.12 使用hive的hql实现男女各自第一名及其它

id sex chinese_s math_s 
0 0 70 50
1 0 90 70
2 1 80 90
1、男女各自语文第一名（0:男，1:女） 
2、男生成绩语文大于80，女生数学成绩大于70

-- 创表
create table if not exists score1(
id int,
sex int,
chinese_s int,
math_s int
)
row format delimited fields terminated by ' '
;
-- 加载数据
load data local inpath "./hivedata/score.txt" into table score1;
--语句实现
1.
select sex,max(chinese_s) from score1 group by sex;
2.
select id,chinese_s from score1 where sex=0 and chinses_s>80
union
select id,math_s from score1 where sex=1 and math_s>70
;

1.1.13 使用hive的hql实现最大连续访问

log_time uid 
2018-10-01 18:00:00,123
2018-10-02 18:00:00,123
2018-10-02 19:00:00,456
2018-10-04 18:00:00,123
2018-10-04 18:00:00,456
2018-10-05 18:00:00,123
2018-10-06 18:00:00,123
求每个用户本月最大连续登录天数

--建表
create table if not exists login1(
log_time string,
uid int
)
row format delimited fields terminated by ','
;
--加载数据
load data local inpath './hivedata/login1.txt' into table login1;

--语句实现
select uid,max(cn) from
(select uid,count(*) cn from
(select uid,date_sub(log_time,row_number() over(distribute by uid sort by log_time)) dts
from login1) a
group by uid,dts) b
group by uid
;

1.2 行列互换

行转列： 
1、使用case when 查询出多列即可，即可增加列。 
列转行: 
1、lateral view explode()，使用炸裂函数可以将1列转成多行，被转换列适用于array、map等类型。 lateral view posexplode(数组)，如有排序需求，则需要索引。将数组炸开成两行(索引 , 值),需要 as 两个别名。 
2、case when 结合concat_ws与collect_set/collect_list实现。内层用case when，外层用 collect_set/list收集，对搜集完后用concat_ws分割连接形成列。

1.2.3 编写sql实现如下：

数据： 
t1表 
uid tags 
1 1,2,3
2 2,3
3 1,2
编写sql实现如下结果： 
uid tag 
1 1
1 2
1 3
2 2
2 3
3 1
3 2

-- 创表
create table if not exists t1(
uid int,
tag array<int>
)
row format delimited fields terminated by ' '
collection items terminated by ','
;
-- 加载数据
load data local inpath "./hivedata/t1.txt" into table t1;
-- 语句实现
select uid,tag from t1 lateral view explode(tags) s as tag;

1.2.4 用户标签连接查询

数据： 
T1表: 
Tags 
1,2,3
1,2
2,3 
T2表: 
Id lab 
1 A
2 B
3 C
根据T1和T2表的数据，编写sql实现如下结果： 
ids tags
1,2,3 A,B,C
1,2 A,B
2,3 B,C

-- 创表
create table if not exists 1t24(
tags string
)
row format delimited fields terminated by ''
;
create table if not exists 2t24(
id string,
lab string
)
row format delimited fields terminated by ' '
;

-- 加载数据
load data local inpath "./hivedata/1t24.txt" into table 1t24
load data local inpath "./hivedata/2t24.txt" into table 2t24;
-- 语句实现
select tags,concat_ws(",",collect_list(lab)) from
(select t1.tags tags,2t24.lab lab from 
(select id,tags from 1t24 lateral view explode(split(tags,",")) tmp as id) t1  
join 2t24
on t1.id=2t24.id) tmp
group by tags;

select tags, concat_ws(",",collect_list(lab)) from (select t1.tags tags, t3.lab lab from (select tags, id from t2 lateral view explode(split(tags,",")) tmp as id) t1 join t3 on t1.id=t3.id) tmp group by tags ;

1.2.5 用户标签组合

数据： 
t1表： 
id tag flag 
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
编写sql实现如下结果：
id tag flag
a b 1|2|3
c d 6|8

-- 创表
create table if not exists t25(
id string,
tag string,
flag string
)
row format delimited fields terminated by ' '
;
-- 加载数据
load data local inpath "./hivedata/t2.5.txt" into table t25;
-- 语句实现
select id,tag,concat_ws("|",collect_set(flag)) flag
from t25
group by id,tag
;

1.2.6 用户标签行列互换

数据： 
t1表 
uid name tags
1 goudan chihuo,huaci
2 mazi sleep
3 laotie paly
编写sql实现如下结果
uid name tag
1 goudan chihuo
1 goudan huaci
2 mazi sleep
3 laotie paly

-- 创表
create table if not exists t26(
uid int,
name string,
tags array<string>
)
row format delimited fields terminated by ' '
collection items terminated by ','
;
-- 加载数据
load data local inpath "./hivedata/t26.txt" into table t26;
-- 语句实现
select uid,name,tag from t26 lateral view explode(tags) t as tag;

select uid,name,tag from t26 lateral view explode(split(tags,",")) t as tag;

1.2.7 hive实现词频统计

数据： 
t1表：
uid contents
1 i|love|china
2 china|is|good|i|i|like
统计结果如下,如果出现次数一样，则按照content名称排序：
content cnt
i 3
china 2
good 1
like 1
love 1
is 1

-- 创表
create table if not exists t27(
uid int,
contents array<string>
)
row format delimited fields terminated by ' '
collection items terminated by '|'
;
-- 加载数据
load data local inpath "./hivedata/t27.txt" into table t27;
-- 语句实现
select content,count(1) cnt from
(select content from t27 lateral view explode(contents) tmp as content) tmp
group by content
order by cnt desc,content
;

1.2.8 课程行转列

数据： 
t1表
id course 
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
根据编写sql，得到结果如下(表中的1表示选修，表中的0表示未选修)： 
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0

--建表
create table if not exists t28(
id int,
course string
)
row format delimited fields terminated by ","
--加载数据
load data local inpath "./hivedata/t28.txt" into table t28
--语句实现
select id,
sum(case when course="a" then 1 else 0 end) a,
sum(case when course="b" then 1 else 0 end) b,
sum(case when course="c" then 1 else 0 end) c,
sum(case when course="e" then 1 else 0 end) d,
sum(case when course="d" then 1 else 0 end) e,
sum(case when course="e" then 1 else 0 end) f
from t28
group by id
;

1.2.9 兴趣行转列

t1表 
name sex hobby
janson 男 打乒乓球、游泳、看电影
tom 男 打乒乓球、看电影
hobby最多3个值，使用hql实现结果如下:
name sex hobby1 hobby2 hobby3
janson 男 打乒乓球 游泳 看电影
tom 男 打乒乓球 看电影

--建表
create table if not exists t29(
name string,
sex string,
hobby string
)
row format delimited fields terminated by " "
--加载数据
load data local inpath "./hivedata/t29.txt" into table t29
--语句实现
select name,sex,
collect_set(hobbies)[0] hobby1,
collect_set(hobbies)[1] hobby2,
collect_set(hobbies)[2] hobby3
from
(select name,sex,hobbies from t29 lateral view explode(split(hobby,"、")) tmp as hobbies) tmp
group by name,sex
;

1.2.10 用户商品行列互换

t1表： 
用户 商品
A P1
B P1
A P2
B P3
请你使用hql变成如下结果:
用户 P1 P2 P3
A 1 1 0
B 1 0 1

select user
sum(case when goods="P1" then 1 else 0 end) P1,
sum(case when goods="P2" then 1 else 0 end) P2,
sum(case when goods="P3" then 1 else 0 end) P3
from t210
group by user
;

1.2.11 成绩课程行列互换

t1表： 
name course score
aa English 75
bb math 85
aa math 90
使用hql输出以下结果
name English math
aa 75 90 
bb 0 85

select name,
sum(case when course="English" then score else 0 end) Engilsh,
sum(case when course="math" then score else 0 end) math
from t211
group by name
;

1.2.12 求top3英雄及其pick率

id names
1 亚索,挖掘机,艾瑞利亚,卡莎,洛
2 亚索,盖伦,奥巴马,牛头,皇子
3 亚索,盖伦,艾瑞利亚,宝石,琴女
4 亚索,盖伦,赵信,老鼠,锤石
求出场最多的top3英雄 及 pick率 pick=出场次数/游戏场数

--建表
create table if not exists t212(
id int,
names string
)
row format delimited fields terminated by " "
--加载数据
load data local inpath "./hivedata/t212.txt" into table t212
--语句实现
select name,cn,pick
from
(select name,row_number() over(sort by cn desc) rn,pick,cn
from
(select name,count(1) cn,round(count(1)/4,2) pick
from
(select id,name from t212 lateral view explode(split(names,",")) tmp as name) tmp
group by name
order by cn desc) a) b
where rn<=3
;
select count(1)/count(distinct id)
from
(select id,name from t212 lateral view explode(split(names,",")) tmp as name) tmp
group name

1.3时间函数

1.3.1常见的时间函数

from_unixtime(bigint unixtime,[string format]): 时间戳转日期函数，
unix_timestamp([string date]): 转换成时间戳，然后转换格式为“yyyy-MM-dd HH:mm:ss“的日期 到UNIX时间戳。如果转化失败，则返回0，返回bigint类型
to_date(string timestamp): 将时间戳转换成日期，默认格式为2011-12-08 10:03:01 
year() : 将时间戳转换成年，默认格式为2011-12-08 10:03:01 
month() : 将时间戳转换成月，默认格式为2011-12-08 10:03:01 
hour() : 将时间戳转换成小时，默认格式为2011-12-08 10:03:01
day(string date) : 将时间戳转换成天，默认格式为2011-12-08 10:03:01
date_diff(string enddate, string startdate) : 日期比较函数，反回结束日期减去开始日期的天数date_sub(string startdate, int days) : 日期减少函数，返回开始日期减少days天后的日期字符串date_add(string startdate, int days) : 日期增加函数，返回开始日期增加days天后的日期字符串last_day(string date) : 返回该月的最后一天的日期，可忽略时分秒部分(HH:mm:ss)。
last_day(string date)返回string类型的值。
next_day(string date,string x) : 返回下一个星期x的日期(x为前两英文星期前两位或者全写 MONDAY)，返回字符串。 
current_date() : 获取当天的日期，返回字符串，没有任何的参数。
current_timestamp() : 获取当前的时间戳

1.3.2 时间戳函数：unix_timestamp，from_unixtime

--获取当前时间戳:
select current_timestamp();
--获取"2019-07-31 11:57:25"对应的时间戳:
select unix_timestamp("2019-07-31 11:57:25","yyyy-MM-dd HH:mm:ss");
--获取"2019-07-31 11:57"对应的时间戳:
select unix_timestamp("2019-07-31 11:57:25","yyyy-MM-dd HH:mm");
--获取时间戳:1564545445所对应的日期和时分秒:
select from_unixtime(1564545445,"yyyy-MM-dd HH:mm:ss");
--获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):
select from_unixtime(1564545446,"yyyy/MM/dd HH");

1.3.3 时间格式转换：yyyyMMdd -> yyyy-MM-dd

数据: 
t1表 
20190730 20190731
编写sql实现如下的结果：
2019-07-30 2019-07-31

create table if not exists date1(
dt string
)
row format delimited fields terminated by " "
;
load data local inpath "./hivedata/date1.txt" into table date1;
select from_unixtime(unix_timestamp(dt,"yyyyMMdd"),"yyyy-MM-dd") from date1;

1.4 交差并集

1.4.1 使用hive求出两个数据集的差集

数据t1表：
id name
1 zs
2 ls
t2表：
id name
1 zs
3 ww
结果如下：
id name
2 ls
3 ww

--建表
create table if not exists 1t41(
id int,
name string
)
row format delimited fields terminated by " "
;
create table if not exists 2t41(
id int,
name string
)
row format delimited fields terminated by " "
;

-- 加载数据
load data local inpath "./hivedata/1t41.txt" into table 1t41;
load data local inpath "./hivedata/2t41.txt" into table 2t41;
-- 语句实现
select t1.id,t1.name
from 1t41 t1 left join 2t41 t2
on t1.id=t2.id
where t2.id is null
union
select t2.id,t2.name
from  2t41 t2 left join 1t41 t1
on t2.id=t1.id
where t1.id is null

1.4.2 两个表A 和B

均有key 和value两个字段，写一个SQL语句，将B表中的value值置成A表中相同key值对应的value值

A:
key vlaue 
k1 123
k2 234
k3 235
B:
key value
k1 111
k2 222
k5 246
使用hive的hql实现，结果是B表数据如下: 
k1 123
k2 234
k5 246

create table if not exists A(
key string,
vlaue int
)
row format delimited fields terminated by ' '
;
create table if not exists B(
key string,
vlaue int
)
row format delimited fields terminated by ' '
;

load data local inpath './hivedata/a.txt' into table A;
load data local inpath './hivedata/b.txt' into table B;

select t2.keys,t2.vlaues 
from
(select t1.key keys,if(vlaueA is null,vlaueB,vlaueA) vlaues
from
(select B.key,B.vlaue vlaueB,A.vlaue vlaueA from A full join B 
on A.key=B.key) t1) t2
where t2.keys is not null
;

update B set B.value=(select A.value from A where A.key=B.key) where B.key in(select B.key from B,A where B.key=A.key);

1.4.3 有用户表user(uid,name)以及黑名单表Banuser(uid)

1、用left join方式写sql查出所有不在黑名单的用户信息
2、用not exists方式写sql查出所有不在黑名单的用户信息

1.5 函数

1.5.1

hive中coalesce()、nvl()、concat_ws()、collect_list()、

collect_set()、regexp_replace().这几个函数的意义

coalesce(T v1, T v2, ...) : 返回列表中的第一个非空元素，如果列表元素都为空则返回NULL。
例:select coalesce(NULL,null,123,"ABC"); 返回123

nvl(T v1,T v2) : 空值判断，如果v1非空则返回v1，如果v1为空，则返回v2，v1和v2需要同类型。
例：select nvl(null,1); 返回1

concat_ws(separator, str1, str2,...) :指定分隔符(第一位)连接字符串函数。参数需要字符 串。
例：select concat_ws("|","1","2","3"); 返回1|2|3 

collect_list(T col) : 将某列的值连接在一起，返回字符串数组，有相同的列值不会去重。通常可以 使用gruop by搭配使用，但是也可以不用group by。
例：select collect_list(id) from t1; 返回将id连接在一起的字符串。
如id值为1，2，2，则返回["1","2","2"] 

collect_set(T col) : 将某列的值连接在一起，返回字符串数组，有相同的列值会去重。通常可以使用 gruop by搭配使用，但是也可以不用group by。
例：select collect_list(id) from t1; 返回将id连接在一起的字符串。如id值为1，2，2，则返回["1","2"]

regexp_replace(source_string, pattern[, replace_string [, position[,occurrence, [match_parameter]]]]):用一个指定的 replace_string 来替换匹配的模式，从而允许复杂的"搜 索并替换"操作。 
例：select regexp_replace(img,".jpg","*.png") from t2; 将img列中有*.png的换成.jpg. 如img有两个数据为1.png 和 2.jsp，则返回1.jpg 和 2.jsp

1.6 聚合

1.6.1 你们公司使用什么来做的cube

使用with cube 、 with rollup 或者grouping sets来实现cube。
详细解释如下： 
0、hive一般分为基本聚合和高级聚合
基本聚合就是常见的group by，高级聚合就是grouping set、cube、rollup等。
一般group by与hive内置的聚合函数max、min、count、sum、avg等搭配使用。

1、grouping sets可以实现对同一个数据集的多重group by操作。
事实上grouping sets是多个group by进行union all操作的结合，它仅使用一个stage完成这些操作。
grouping sets的子句中如果包换() 数据集，则表示整体聚合。多用于指定的组合查询。

2、cube俗称是数据立方，它可以时限hive任意维度的组合查询。
即使用with cube语句时，可对group by后的维度做任意组合查询
如：group a,b,c with cube ,则它首先group a,b,c 然后依次group by a,c 、 group by b,c、group by a,b 、group a 、group b、group by c、group by () 等这8种组合查询，所以一般cube个数=2^3个。2是定值，3是维度的个数。多用于无级联关系的任意组合查询。

3、rollup是卷起的意思，俗称层级聚合，相对于grouping sets能指定多少种聚合，而with rollup则表示从左往右的逐级递减聚合，如:group by a,b,c with rollup 等价于 group by a, b, c grouping sets( (a, b, c), (a, b), (a), ( )).直到逐级递减为()为止,多适用于有级联关系的组合查询，如国家、省、市级联组合查询。 

4、Grouping__ID在hive2.3.0版本被修复过，修复后的发型版本和之前的不一样。对于每一列，如果这列 被聚合过则返回0，否则返回1。应用场景暂时很难想到用于哪儿。

5、grouping sets/cube/rollup三者的区别： 注: 
grouping sets是指定具体的组合来查询。
with cube 是group by后列的所有的维度的任意组合查询。
with rollup 是group by后列的从左往右逐级递减的层级组合查询。
cube/rollup 后不能加()来选择列，hive是要求这样。

1.9 普通查询

1.9.1 使用hive的hql查询用户所在部门

dpt表 
dpt_id dpt_name
1 产品
2 技术
user_dpt表
user_id dpt_id
1 1
2 1
3 2
4 2
5 3
result表
user_id dpt_id dpt_name
1 1 产品
2 1 产品
3 2 技术
4 2 技术
5 3 其他部门

select t2.user_id,t2.dpt_id,case when dpt_name is null then `其它部门` else t1.dpt_name
from dpt t1 left join user_dpt t2
on t1.dpt_id=t2.dpt_id
;
select t2.user_id,t2.dpt_id,nvl(t1.dpt_name,`其他部门`)
from dpt t1 left join user_dpt t2
on t1.dpt_id=t2.dpt_id
;

1.9.2 查出每个学期每门课程最高分记录

course_score表:

id,userid,course,score,term
1,zhangsan,数学,80,2015
2,lisi,语文,90,2016
3,lisi,数学,70,2016
4,wangwu,化学,80,2017
5,zhangsan,语文,85,2015
6,zhangsan,化学,80,2015

编写sql完成如下查询，一次查询实现最好，也可以写多次查询实现:
1、查出每个学期每门课程最高分记录(包含全部5个字段)
2、查出单个学期中语文课在90分以上的学生的数学成绩记录(包含全部5个字段)

create table course_score(
id string, 
name string,
course string,
score int,
year string )
row format delimited fields terminated by ','
;
load data local inpath './hivedata/course_score.txt' overwrite into table course_score;

--语句实现
select id,name,course,score,year from
(select id,name,course,score,year,row_number() over(distribute by course,year sort by score desc) rn 
from course_score) a
where rn=1
;

select id,name,course,score,t1.year 
from course_score t1 join 
(select year from course_score where course="语文" and score>=90 group by year) t2
on t1.year=t2.year
where course="数学"
;

如隔三秋_

关注

9
点赞
踩
27

收藏

觉得还不错? 一键收藏
0
评论
Hive场景题

title: Hive场景题练习date: 2019-10-15 23:21:44更新中ing…Hive场景题1.1 hive窗口函数1.1.4 编写连续7天登录的总人数数据: t1表 Uid dt login_status(1登录成功,0异常) 1 2019-07-11 1 1 2019-07-12 1 1 2019-07-13 1 1 2019-07-14 1 1 2...
复制链接

扫一扫