数据仓库工具 hive的入门(六) sql练习题

题目1:

t1.dat文件内容

2;zhangsan;book,TV,code;beijing:chaoyang,shagnhai:pudong
3;lishi;book,code;nanjing:jiangning,taiwan:taibei
4;wangwu;music,book;heilongjiang:haerbin

要求:创建表t1和加载数据;

题目2:连续7天登录的用户

ulogin.dat数据:

-- uid dt status(1 正常登录,0 异常)
1 2019-07-11 1
1 2019-07-12 1
1 2019-07-13 1
1 2019-07-14 1
1 2019-07-15 1
1 2019-07-16 1
1 2019-07-17 1
1 2019-07-18 1
2 2019-07-11 1
2 2019-07-12 1
2 2019-07-13 0
2 2019-07-14 1
2 2019-07-15 1
2 2019-07-16 0
2 2019-07-17 1
2 2019-07-18 0
3 2019-07-11 1
3 2019-07-12 1
3 2019-07-13 1
3 2019-07-14 0
3 2019-07-15 1
3 2019-07-16 1
3 2019-07-17 1
3 2019-07-18 1
题目3:编写sql语句实现每班前三名,分数一样并列,同时求出前三名按名次排序的分差
-- 数据。sid class score
1 1901 90
2 1901 90
3 1901 83
4 1901 60
5 1902 66
6 1902 23
7 1902 99
8 1902 67
9 1902 87
-- 待求结果数据如下:
class  score  rank  lagscore
1901   90    1    0
1901   90    1    0
1901   83    2    -7
1901   60    3    -23
1902   99    1    0
1902   87    2    -12
1902   67    3    -20

提供表加载数据语句:

-create table stu(
 sno int,
 class string,
 score int
)row format delimited fields terminated by ' ';
-- 加载数据
load data local inpath '/home/hadoop/data/stu.dat' into table
stu;

题目4:行 <=> 列

-- 数据:id course
1 java
1 hadoop
1 hive
1 hbase
2 java
2 hive
2 spark
2 flink
3 java
3 hadoop
3 hive
3 kafka

提供表加载数据语句:

create table rowline1(
 id string,
 course string
 )row format delimited fields terminated by ' ';
load data local inpath '/root/data/data1.dat' into table
rowline1;
-- 编写sql,得到结果如下(1表示选修,0表示未选修)
id   java  hadoop hive  hbase  spark  flink  kafka
1    1    1    1    1    0    0    0
2    1    0    1    0    1    1    0
3    1    1  

答案:

题目1:
create table t1(id int,name string,hobby array<string>,addr map<string, string>)row format delimited fields terminated by ";" collection items terminated by "," map keys terminated by ":";

load data local inpath "/home/hive/data/t1.tat" into table t1;
题目2:
-- 1、使用 row_number 在组内给数据编号(rownum)
-- 2、某个值 - rownum = gid,得到结果可以作为后面分组计算的依据
-- 3、根据求得的gid,作为分组条件,求最终结果
select uid, dt,
   date_sub(dt, row_number() over (partition by uid order
by dt)) gid
 from ulogin
where status=1;

select uid, count(*) logincount
 from (select uid, dt,
       date_sub(dt, row_number() over (partition by
uid order by dt)) gid
     from ulogin
    where status=1) t1
group by uid, gid
having logincount>=7;
题目3:
-- 求解思路:
-- 1、上排名函数,分数一样并列,所以用dense_rank
-- 2、将上一行数据下移,相减即得到分数差
-- 3、处理 NULL
with tmp as (
select sno, class, score,
   dense_rank() over (partition by class order by score
desc) as rank
 from stu
);

select class, score, rank,
   nvl(score - lag(score) over (partition by class order
by score desc), 0) lagscore
 from tmp
where rank<=3;
题目4:
-- 使用case when;group by + sum
select id,
sum(case when course="java" then 1 else 0 end) as java,
sum(case when course="hadoop" then 1 else 0 end) as hadoop,
sum(case when course="hive" then 1 else 0 end) as hive,
sum(case when course="hbase" then 1 else 0 end) as hbase,
sum(case when course="spark" then 1 else 0 end) as spark,
sum(case when course="flink" then 1 else 0 end) as flink,
sum(case when course="kafka" then 1 else 0 end) as kafka
 from rowline1
group by id;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值