HiveSQL经典面试题(建议点赞收藏)

本文主要介绍了在IT技术面试中常见的问题,如筛选所有科目成绩大于80分的学生、处理连续登录和留存问题、数据表的行列转换、计算中位数以及生成连续数值等,涉及SQL查询和HSQL进阶技巧。
摘要由CSDN通过智能技术生成

目录

经典面试系列

每科成绩都大于80分的学生信息

连续登录问题

行列转换问题

留存问题:看当天登录后第N天是否登录

TopN问题

累计计算问题

HSQL进阶版

 直播间在线最大人数

SQL循环

计算中位数

产生连续数值


经典面试系列

  1. 每科成绩都大于80分的学生信息

    1. 建表+初始化
      --创建表scdn_student_score_test 并且初始化三个学生成绩
      create table hdw_tmp_dev.scdn_student_score_test as 
      select '张三' as name,'数学' as subject ,'80' as score 
      union all 
      select '张三' as name,'语文' as subject ,'90' as score
      union all 
      select '张三' as name,'英语' as subject ,'90' as score 
      union all 
      select '李四' as name,'数学' as subject ,'90' as score
      union all 
      select '李四' as name,'语文' as subject ,'90' as score
      union all 
      select '李四' as name,'英语' as subject ,'70' as score 
      union all 
      select '王五' as name,'数学' as subject ,'90' as score
      union all 
      select '王五' as name,'语文' as subject ,'90' as score
      union all 
      select '王五' as name,'英语' as subject ,'50' as score 
      --查询结果显示
      | scdn_student_score_test.name  | scdn_student_score_test.subject  | scdn_student_score_test.score  |
      +-------------------------------+----------------------------------+--------------------------------+
      | 张三                            | 数学                               | 80                             |
      | 张三                            | 语文                               | 90                             |
      | 张三                            | 英语                               | 90                             |
      | 李四                            | 数学                               | 90                             |
      | 李四                            | 语文                               | 90                             |
      | 李四                            | 英语                               | 70                             |
      | 王五                            | 数学                               | 90                             |
      | 王五                            | 语文                               | 90                             |
      | 王五                            | 英语                               | 50                             |
      +-------------------------------+----------------------------------+---------------------
      
      --求所有学科成绩都大于等于80分的学生姓名
    2. 思路一(思维转换):所有问题找最小(最大)
      select 
          t1.name
          ,t1.min_score 
      from 
          (
              select 
                  name 
                  ,min(score)  as min_score
              from hdw_tmp_dev.scdn_student_score_test
              group by name 
          ) as t1 --求出最小的成绩
      where t1.min_score >=80 --最小成绩大于等于80,则这个学生的所有成绩都会大于等于80
    3. 执行结果
    4. 思路二(巧用左关联进行筛选):
      select 
          t1.name 
      from 
      (
          select 
              name 
          from hdw_tmp_dev.scdn_student_score_test
          group by name 
      ) as t1 
      left join 
      (
          select 
              name 
          from hdw_tmp_dev.scdn_student_score_test
          where score <80
          group by name 
      ) as t2 on t1.name = t2.name
      where t2.name is null 
  2. 连续登录问题

    1. 建表+初始化
      create table hdw_tmp_dev.csdn_user_login_test as 
      select 'xiaoming' as user_name,'2024-01-01' as login_date 
      union all 
      select 'xiaoming' as user_name,'2024-01-02' as login_date 
      union all 
      select 'xiaoming' as user_name,'2024-01-03' as login_date
      union all 
      select 'xiaoming' as user_name,'2024-01-04' as login_date
      union all 
      select 'xiaoming' as user_name,'2024-01-05' as login_date
      union all 
      select 'dahuang' as user_name,'2024-01-02' as login_date 
      union all 
      select 'dahuang' as user_name,'2024-01-03' as login_date
      union all 
      select 'dahuang' as user_name,'2024-01-04' as login_date
      union all 
      select 'dahuang' as user_name,'2024-01-05' as login_date
      union all 
      select 'lucky_dog' as user_name,'2024-01-01' as login_date 
      union all 
      select 'lucky_dog' as user_name,'2024-01-03' as login_date
      union all 
      select 'lucky_dog' as user_name,'2024-01-04' as login_date
      union all 
      select 'lucky_dog' as user_name,'2024-01-05' as login_date
    2. Sql参考
      select 
          t2.user_name 
          ,t2.date_begin_flag
          ,count(1) as max_login_days
      from 
      (
          select 
              t1.user_name 
              ,t1.login_date
              ,date_sub(date(login_date),t1.rn) as date_begin_flag
          from 
          (
              select 
                  user_name 
                  ,login_date 
                  ,row_number()over(partition by user_name order by login_date) as rn 
              from hdw_tmp_dev.csdn_user_login_test
          ) as t1
      ) as t2  
      group by t2.user_name 
          ,t2.date_begin_flag
    3. 执行过程+结果
  3. 行列转换问题

    1. 建表+初始化:参考问题一的表
      | t1.name              | t1.subject             | t1.score  |
      +-------------------------------+----------------------------------+-----------------+
      | 张三                 | 数学                    | 80                             |
      | 张三                 | 语文                    | 90                             |
      | 张三                 | 英语                    | 90                             |
      | 李四                 | 数学                    | 90                             |
      | 李四                 | 语文                    | 90                             |
      | 李四                 | 英语                    | 70                             |
      | 王五                 | 数学                    | 90                             |
      | 王五                 | 语文                    | 90                             |
      | 王五                 | 英语                    | 50                             |
      +-------------------------------+----------------------------------+------------------
    2. 行专列Sql参考
      select 
          name 
          ,max(case when subject = '数学' then score end) as math_score
          ,max(case when subject = '语文' then score end) as china_score
          ,max(case when subject = '英语' then score end) as english_score
      from hdw_tmp_dev.scdn_student_score_test
      group by name 
    3. 行专列执行结果
    4. 列转行:可以参考建表初始化语句(union all)
      select 
          name
          ,collect_set(subject)  as subject_set
      from hdw_tmp_dev.scdn_student_score_test
      group by name 
    5. 每个学生选课结果(多行变一行):

    6.  将上面的结果展开(一行变多行)数据准备:

      create table hdw_tmp_dev.scdn_student_score_test_collect as 
      select 'zhangsan' as name ,'"数学","语文","英语"' as subject_list
      union all 
      select 'lisi' as name ,'"美术","生物","物理"' as subject_list
      union all 
      select 'wangwu' as name ,'"计算机","日语","韩语"' as subject_list
    7. 炸开代码
      select 
          name
          ,subject_list
          ,subject_name
      from hdw_tmp_dev.scdn_student_score_test_collect
      lateral view explode(split(subject_list,',')) extend_sub as subject_name
    8. 结果

  4. 留存问题:看当天登录后第N天是否登录

    1. 建表+初始化:参照连续登录表
    2. Sql参考
      --方案一:利用lead(日期,N)是否等于 当天登录实践+N天
      select 
          t1.user_name
          ,t1.logon_date
          ,case when lead1_logon_date = date_add(logon_date,1) then '1天留存' end as 1day_remain
          ,case when lead3_logon_date = date_add(logon_date,3) then '3天留存' end as 3day_remain
      from 
      (
          select 
              user_name 
              ,logon_date 
              ,lead(user_name,1)over(partition by user_name order by logon_date) as lead1_user_name
              ,lead(logon_date,1)over(partition by user_name order by logon_date) as lead1_logon_date
              ,lead(user_name,3)over(partition by user_name order by logon_date) as lead3_user_name
              ,lead(logon_date,3)over(partition by user_name order by logon_date) as lead3_logon_date
          from hdw_tmp_dev.csdn_user_logon_test
      ) as t1 
      --方案二:
      select 
          t2.first_log_date       as first_log_date
          ,count(t2.user_id)      as new_user_cnt --新用户数 
          ,count(t3.user_id)      as next_user_id --次日回访用户数
          ,count(t4.user_id)      as 30_user_id   --30天回访用户数
          ,count(t3.user_id)/count(t2.user_id)    as next_back_rate --次日回访率
          ,count(t4.user_id)/count(t2.user_id)    as 30_back_rate   --30天回访率
      from 
      (
          select 
              first_log_date
              ,user_id 
              ,date_add(first_log_date,1)  as next_log_date
              ,date_add(first_log_date,29) as 30_log_date
          from 
          (
              select 
                  user_id 
                  ,log_time 
                  ,first_value(date(log_time))over(partition by user_id) as first_log_date 
              from user_log 
          ) as t1 
          group by first_log_date
              ,user_id 
      ) as t2 
      left join 
      (
          select 
              user_id 
              ,date(log_date) as log_date  
          from user_log 
          group by user_id 
              ,date(log_date) as log_date 
      ) as t3 on t2.user_id = t2.user_id and t2.next_log_date = t3.log_date
      left join 
      (
          select 
              user_id 
              ,date(log_date) as log_date  
          from user_log 
          group by user_id 
              ,date(log_date) as log_date 
      ) as t4 on t2.user_id = t4.user_id and t2.30_log_date = t4.log_date
      group by t2.first_log_date 
    3. 方案一执行结果
  5. TopN问题

    1. 建表+初始化
      create table hdw_tmp_dev.scdn_student_score_test1 as 
      select '张三' as name,'数学' as subject ,'80' as score 
      union all 
      select '张三' as name,'语文' as subject ,'90' as score
      union all 
      select '张三' as name,'英语' as subject ,'90' as score 
      union all 
      select '李四' as name,'数学' as subject ,'90' as score
      union all 
      select '李四' as name,'语文' as subject ,'90' as score
      union all 
      select '李四' as name,'英语' as subject ,'70' as score 
      union all 
      select '王五' as name,'数学' as subject ,'90' as score
      union all 
      select '王五' as name,'语文' as subject ,'90' as score
      union all 
      select '王五' as name,'英语' as subject ,'50' as score 
      union all 
      select '小明' as name,'数学' as subject ,'88' as score
      union all 
      select '小明' as name,'语文' as subject ,'99' as score
      union all 
      select '小明' as name,'英语' as subject ,'77' as score
      union all 
      select '小文' as name,'数学' as subject ,'66' as score
      union all 
      select '小文' as name,'语文' as subject ,'89' as score
      union all 
      select '小文' as name,'英语' as subject ,'90' as score
    2. Sql参考:求每科前三名对应的人员的成绩单
      select 
          *
      from 
      (
          select 
              name
              ,subject
              ,score
              ,row_number()over(partition by subject order by score desc) as rn 
          from hdw_tmp_dev.scdn_student_score_test1
      ) as t1 
      where rn<=3
    3. 执行结果
  6. 累计计算问题

    1. 建表+初始化
      create table hdw_tmp_dev.user_sale_date as 
      select '001' as user_id,'2024-02-01' as sale_date, 100 as amount 
      union all 
      select '001' as user_id,'2024-02-02' as sale_date, 200 as amount 
      union all 
      select '001' as user_id,'2024-02-03' as sale_date, 300 as amount 
      union all 
      select '001' as user_id,'2024-02-04' as sale_date, 400 as amount 
      union all 
      select '001' as user_id,'2024-02-05' as sale_date, 500 as amount 
      union all 
      select '001' as user_id,'2024-02-06' as sale_date, 600 as amount 
    2. SQL逻辑
      select 
          user_id
          ,sale_date
          ,amount
          ,sum(amount)over(partition by user_id order by sale_date) as accuma_amount --按日期逐渐累加
          ,sum(amount)over(partition by user_id) as total_amount --按人汇总
          ,avg(amount)over(partition by user_id) as avg_amount --安人平均每天
          ,max(amount)over(partition by user_id) as max_amount --单日最大销售
          ,min(amount)over(partition by user_id) as min_amount --单日最小销售
      from hdw_tmp_dev.user_sale_date
    3. 结果展示

HSQL进阶版

  1.  直播间在线最大人数

    1. 建表+初始化
      create table hdw_tmp_dev.csdn_user_login_time_detail as 
      select '001' as user_id,'2024-02-01 10:00:00' as begin_time,'2024-02-01 12:00:00' as end_time 
      union all 
      select '002' as user_id,'2024-02-01 11:00:00' as begin_time,'2024-02-01 13:00:00' as end_time 
      union all 
      select '003' as user_id,'2024-02-01 11:00:00' as begin_time,'2024-02-01 14:00:00' as end_time 
      union all 
      select '004' as user_id,'2024-02-01 15:00:00' as begin_time,'2024-02-01 16:00:00' as end_time 
    2. SQL逻辑
      select 
          t1.user_id 
          ,t1.time1
          ,t1.flag
          ,sum(flag)over(order by t1.time1) as user_cnt 
      from 
      (
          select 
              user_id
              ,begin_time as time1 
              ,1    as flag 
          from hdw_tmp_dev.csdn_user_login_time_detail
          union all 
          select 
              user_id
              ,end_time
              ,-1    as flag 
          from hdw_tmp_dev.csdn_user_login_time_detail
      ) as t1 
    3. 结果展示
  2. SQL循环

    1. 建表+初始化
      create table hdw_tmp_dev.cycle_1 as 
      select '1011' as a 
      union all 
      select '0101' as a
    2. SQL逻辑
      select  
              a,
              concat_ws(",",collect_list(cast(index  as  string)))  as  res
      from  (
              select  
                      a,
                      index+1  as  index,
                      chr
              from  (
                      select  
                              a,
                              concat_ws(",",substr(a,1,1),substr(a,2,1),substr(a,3,1),substr(a,-1))  str
                      from  hdw_tmp_dev.cycle_1 as t8
              )  tmp1
              lateral  view  posexplode(split(str,","))  t  as  index,chr
              where  chr  =  "1"
      )  tmp2
      group  by  a;
    3. 结果展示
  3. 计算中位数

    1. 建表+初始化:
      create table hdw_tmp_dev.user_sale_date as 
      select '001' as user_id,'2024-02-01' as sale_date, 100 as amount 
      union all 
      select '001' as user_id,'2024-02-02' as sale_date, 200 as amount 
      union all 
      select '001' as user_id,'2024-02-03' as sale_date, 300 as amount 
      union all 
      select '001' as user_id,'2024-02-04' as sale_date, 400 as amount 
      union all 
      select '001' as user_id,'2024-02-05' as sale_date, 500 as amount 
      union all 
      select '001' as user_id,'2024-02-06' as sale_date, 600 as amount 
    2. SQL逻辑
      select 
          t1.user_id
          ,t1.sale_date
          ,t1.amount
          ,avg(t1.amount)over(partition by t1.user_id) as zhongwenshu
      from 
      (
          select 
              user_id
              ,sale_date
              ,amount
              ,row_number()over(partition by user_id order by amount) as rn 
          from hdw_tmp_dev.user_sale_date
      ) as t1 
      left join 
      (
          select 
              user_id 
              ,count(1) as cnt 
          from hdw_tmp_dev.user_sale_date
          group by user_id 
      ) as t2 on t1.user_id = t2.user_id 
      where t1.rn in (cnt/2,(cnt+1)/2,cnt/2+1)
      --总个数为奇数命中(cnt+1)/2;总个数为偶数命中:cnt/2,cnt/2+1,两数相加求平均值
    3. 结果展示
  4. 产生连续数值

    1. SQL逻辑
      --产生1到10的连续数据
      select
          start_id + pos  as  id
          ,pos
          ,val
      from(
          select
              1   as  start_id,
              10  as  end_id
      )  m  lateral  view  posexplode(split(space(end_id - start_id),''))  t  as  pos,  val
      --方案二
      select
          row_number()  over()  as  id
      from    
          (select  split(space(99), '') as  x)  t
      lateral  view
      explode(x)  ex;
    2. 结果展示
  • 32
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 5
    评论
Hive SQL优化面试题通常涉及以下几个方面: 1. 查询优化:了解如何通过索引、分区等技术来优化Hive SQL查询性能。可以提到使用合适的索引、分区和分桶来减少数据扫描量,提高查询效率。 2. 性能调优:理解常见的性能瓶颈和优化策略,例如避免全表扫描、减少数据倾斜、合理设置并行度等。可以提到使用合适的数据类型、避免不必要的数据转换、使用合适的连接方式等来提高性能。 3. EXPLAIN关键字的使用:掌握使用EXPLAIN关键字来分析查询执行计划,了解查询的执行顺序和涉及的操作,从而找到潜在的性能问题并进行优化。 4. 解答优化相关的题目:在面试中可能会遇到一些关于查询优化和性能调优的具体问题,例如如何优化某个特定的查询语句,如何处理大数据量的查询等。在回答时可以结合自己的实际经验和知识来给出合理的解决方案。 综上所述,Hive SQL优化面试题主要涉及查询优化、性能调优、使用EXPLAIN关键字分析查询计划以及解答具体的优化问题。掌握这些知识和技巧可以帮助提升在Hive SQL领域的技能和竞争力。 #### 引用[.reference_title] - *1* *3* [Hive SQL大厂面试题必考大全](https://blog.csdn.net/m0_47256162/article/details/131687792)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [Hive SQL面试题(附答案)](https://blog.csdn.net/a934079371/article/details/122227602)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

大数据松松

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值