Hive练习一

本文展示了在Hive中创建subject表并加载数据的过程,以及一系列查询操作,包括查询学生分数、总分、年级总分排名前三、文科一班总分排名前10以及各班级的平均分和最高总分。这些查询涉及到Hive的JOIN、GROUP BY、聚合函数等高级用法,是大数据分析中的常见操作。
摘要由CSDN通过智能技术生成

模仿建表语句,创建subject表,并使用hdfs dfs -put 命令加载数据

hive> create table subject(
    > subject_id bigint comment '科目id'
    > ,subject_name string comment '科目名称'
    > ,subject_score bigint comment '科目总分'
    > ) row format delimited fields terminated by ',';
OK
Time taken: 0.063 seconds
hive> load data local inpath '/usr/local/soft/data/subject.txt' into table subject;
Loading data to table test5.subject
Table test5.subject stats: [numFiles=1, totalSize=180]
OK
Time taken: 0.544 seconds
hive> select * from subject;
OK
1000001	语文	150
1000002	数学	150
1000003	英语	150
1000004	政治	100
1000005	历史	100
1000006	物理	100
1000007	化学	100
1000008	地理	100
1000009	生物	100
Time taken: 0.035 seconds, Fetched: 9 row(s)

 查询学生分数(输出:学号,姓名,班级,科目id,科目名称,成绩

hive> select  t1.id
    >         ,t1.name
    >         ,t1.clazz
    >         ,t2.score_id
    >         ,t3.subject_name
    >         ,t2.score
    > from students t1
    > left join score t2
    > on t1.id = t2.id
    > left join subject t3
    > on t2.score_id = t3.subject_id;

查询学生总分(输出:学号,姓名,班级,总分)
-- 先关联再聚合

-- 先聚合再关联

hive> select  tt1.id
    >         ,tt1.name
    >         ,tt1.clazz
    >         ,sum(tt1.score) as sum_score
    > from (
    >     select  t1.id
    >             ,t1.name
    >             ,t1.clazz
    >             ,t2.score
    >     from students t1
    >     left join score t2
    >     on t1.id = t2.id
    > ) tt1 group by tt1.id,tt1.name,tt1.clazz;
hive> select  t1.id
    >         ,t1.name
    >         ,t1.clazz
    >         ,t2.sum_score
    > from students t1
    > left join(
    >     select  id
    >             ,sum(score) as sum_score
    >     from score
    >     group by id
    > ) t2 on t1.id = t2.id;

 查询全年级总分排名前三(不分文理科)的学生(输出:学号,姓名,班级,总分)

hive> select  t1.id
    >         ,t1.name
    >         ,t1.clazz
    >         ,t2.sum_score
    > from students t1
    > left join(
    >     select  id
    >             ,sum(score) as sum_score
    >     from score
    >     group by id
    > ) t2 on t1.id = t2.id
    > order by t2.sum_score desc
    > limit 3;

查询文科一班学生总分排名前10的学生(输出:学号,姓名,班级,总分)

hive> select  t1.id
    >         ,t1.name
    >         ,t1.clazz
    >         ,t2.sum_score
    > from (
    >     select  id
    >             ,name
    >             ,clazz
    >     from students
    >     where clazz = '文科一班'
    > ) t1 left join(
    >     select  id
    >             ,sum(score) as sum_score
    >     from score
    >     group by id
    > ) t2 on t1.id = t2.id
    > order by t2.sum_score desc
    > limit 10;

 查询每个班级学生总分的平均成绩(输出:班级,平均分)

select  tt1.clazz
        ,round(avg(tt1.sum_score),3) as avg_score
from (
    select  t1.id
            ,t1.name
            ,t1.clazz
            ,t2.sum_score
    from students t1
    left join(
        select  id
                ,sum(score) as sum_score
        from score
        group by id
    ) t2 on t1.id = t2.id
) tt1 group by tt1.clazz;

查询每个班级的最高总分(输出:班级,总分)

hive> select  tt1.clazz
    >         ,max(tt1.sum_score) as max_score
    > from (
    >     select  t1.id
    >             ,t1.name
    >             ,t1.clazz
    >             ,t2.sum_score
    >     from students t1
    >     left join(
    >         select  id
    >                 ,sum(score) as sum_score
    >         from score
    >         group by id
    >     ) t2 on t1.id = t2.id
    > ) tt1 group by tt1.clazz;

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值