Hive_HQL_行转列与列转行

参考文章:

1.HIVE行转列和列转行

https://www.cnblogs.com/blogyuhan/p/9274784.html

2.HIVE行转列和列转行

https://blog.csdn.net/jiantianming2/article/details/79189672

行转列 :将多行数据转换为一行数据中的多列

列转行 :将一行中多列数据转换为多行

行转列

方式一 : 通过 GROUP BY  + CASE WHEN + 聚合函数

需求如下

1.有用户学习情况的 uid,  科目,分数  多行数据。已知每个人所学分为 英语,数学,语文3个科目。

uid, subject,score 转换为 uid,math_score, chinese_score, english_score

SQL

INSERT OVERWRITE TABLE user_subject_score2
SELECT 
	uid
	,MAX( 
		CASE
			WHEN subject = 'math' THEN score
			ELSE -1
		END
	) AS chinese_score
	,MAX( 
		CASE
			WHEN subject = 'chinese' THEN score
			ELSE -1
		END
	) AS chinese_score
	,MAX( 
		CASE
			WHEN subject = 'english' THEN score
			ELSE -1
		END
	) AS chinese_score
FROM user_subject_score1
GROUP BY uid
;

+--------------------------+---------------------------------+------------------------------------+------------------------------------+
| user_subject_score2.uid  | user_subject_score2.math_score  | user_subject_score2.chinese_score  | user_subject_score2.english_score  |
+--------------------------+---------------------------------+------------------------------------+------------------------------------+
| 1                        | 80.0                            | 76.0                               | 80.0                               |
| 2                        | 88.0                            | 88.0                               | -1.0                               |
| 3                        | 66.0                            | 30.0                               | -1.0                               |
+--------------------------+---------------------------------+------------------------------------+------------------------------------+

方式二: 通过 GROUP BY  + CONCAT_WS + COLLECT_LIST

 有 用户id ,订单id 求 用户的所有订单列表

uid, order_id  ->  uid,order_ids

我们看下这两个函数的使用方法

CONCAT_WS 

+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator. |
+----------------------------------------------------+
 

COLLECT_LIST

+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| collect_list(x) - Returns a list of objects with duplicates |
+----------------------------------------------------+
 

建表语句与数据录入

use data_warehouse_test;

CREATE TABLE IF NOT EXISTS user_order (
	uid BIGINT
	,order_id BIGINT
);

CREATE TABLE IF NOT EXISTS user_orders (
	uid BIGINT
	,order_ids STRING
);

INSERT OVERWRITE TABLE user_order VALUES 
(1, 112)
,(1, 123)
,(2, 234)
,(2, 21)
,(3, 821)
;

分析SQL

use data_warehouse_test;

INSERT OVERWRITE TABLE user_orders
SELECT 
	uid
	,CONCAT_WS(',', COLLECT_LIST(order_str)) AS order_list
FROM
(
	SELECT uid , CAST(order_id AS STRING) AS order_str
	FROM user_order
) tmp
GROUP BY uid
;

输出

+------------------+------------------------+
| user_orders.uid  | user_orders.order_ids  |
+------------------+------------------------+
| 1                | 112,123                |
| 2                | 234,21                 |
| 3                | 821                    |
+------------------+------------------------+
 

列转行

构建测试数据

USE data_warehouse_test;
 
CREATE TABLE IF NOT EXISTS explode_laterview_org(
	day1_num BIGINT
	,day2_num BIGINT
	,day3_num BIGINT
	,day4_num BIGINT
	,day5_num BIGINT
	,day6_num BIGINT
	,day7_num BIGINT
	,campaign_name STRING
	,campaign_id BIGINT
);
 
 
INSERT OVERWRITE TABLE explode_laterview_org VALUES 
(40, 20, 10, 4, 4, 2, 1, 'zoo', 2 )
,(100, 80, 53, 40, 7, 6, 5, 'moji', 3)
;

需要将表中数据转换为以下格式

+--------------+----------------+-----------+------+
| campaign_id  | campaign_name  |   type    | num  |
+--------------+----------------+-----------+------+
| 2            | zoo            | day1_num  | 40   |
| 2            | zoo            | day2_num  | 20   |
| 2            | zoo            | day3_num  | 10   |
| 2            | zoo            | day4_num  | 4    |
| 2            | zoo            | day5_num  | 4    |
| 2            | zoo            | day6_num  | 2    |
| 2            | zoo            | day7_num  | 1    |
| 3            | moji           | day1_num  | 100  |
| 3            | moji           | day2_num  | 80   |
| 3            | moji           | day3_num  | 53   |
| 3            | moji           | day4_num  | 40   |
| 3            | moji           | day5_num  | 7    |
| 3            | moji           | day6_num  | 6    |
| 3            | moji           | day7_num  | 5    |
+--------------+----------------+-----------+------+

方式一 :采用 UNION ALL 的方式

SELECT campaign_id, campaign_name, 'day1_num', day1_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day2_num', day2_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day3_num', day3_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day4_num', day4_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day5_num', day5_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day6_num', day6_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day7_num', day7_num
FROM explode_laterview_org
;

输出

+------------------+--------------------+-----------+---------------+
| _u1.campaign_id  | _u1.campaign_name  |  _u1._c2  | _u1.day1_num  |
+------------------+--------------------+-----------+---------------+
| 2                | zoo                | day1_num  | 40            |
| 2                | zoo                | day2_num  | 20            |
| 2                | zoo                | day3_num  | 10            |
| 2                | zoo                | day4_num  | 4             |
| 2                | zoo                | day5_num  | 4             |
| 2                | zoo                | day6_num  | 2             |
| 2                | zoo                | day7_num  | 1             |
| 3                | moji               | day1_num  | 100           |
| 3                | moji               | day2_num  | 80            |
| 3                | moji               | day3_num  | 53            |
| 3                | moji               | day4_num  | 40            |
| 3                | moji               | day5_num  | 7             |
| 3                | moji               | day6_num  | 6             |
| 3                | moji               | day7_num  | 5             |
+------------------+--------------------+-----------+---------------+
 

tips:方式一在指标少时方便使用,当行转列的指标比较多时代码量会比较大,维护困难

方式二:使用lateral view和str_to_map

SELECT 
    campaign_id, campaign_name, type, num
FROM explode_laterview_org
LATERAL VIEW
    EXPLODE(
            STR_TO_MAP(
                    CONCAT(
                        'day1_num=',CAST (day1_num AS STRING),
                        '&day2_num=',CAST (day2_num AS STRING),
                        '&day3_num=',CAST (day3_num AS STRING),
                        '&day4_num=',CAST (day4_num AS STRING),
                        '&day5_num=',CAST (day5_num AS STRING),
                        '&day6_num=',CAST (day6_num AS STRING),
                        '&day7_num=',CAST (day7_num AS STRING)
                    )
                ,'&', '=')
        ) lateral_table AS type, num
;

+--------------+----------------+-----------+------+
| campaign_id  | campaign_name  |   type    | num  |
+--------------+----------------+-----------+------+
| 2            | zoo            | day1_num  | 40   |
| 2            | zoo            | day2_num  | 20   |
| 2            | zoo            | day3_num  | 10   |
| 2            | zoo            | day4_num  | 4    |
| 2            | zoo            | day5_num  | 4    |
| 2            | zoo            | day6_num  | 2    |
| 2            | zoo            | day7_num  | 1    |
| 3            | moji           | day1_num  | 100  |
| 3            | moji           | day2_num  | 80   |
| 3            | moji           | day3_num  | 53   |
| 3            | moji           | day4_num  | 40   |
| 3            | moji           | day5_num  | 7    |
| 3            | moji           | day6_num  | 6    |
| 3            | moji           | day7_num  | 5    |
+--------------+----------------+-----------+------+
 

tips

      方式二极大的减少重复的代码量,需要熟悉lateral view和str_to_map,concat等函数使用,concat主要拼接key-value形式存储的字符串,如在一个项目开发中使用一个sql处理出十多个指标,后期需要列存储各指标值时,使用方式二就可以使代码减少数10倍,极大提升代码可读性和可维护性,同时key值可以自定义。

如果对这些函数不太熟悉,请参考我的文章

https://blog.csdn.net/u010003835/article/details/106632597

  • 4
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
Hive中的转行可以通过使用函数EXPLODE和LATERAL VIEW来实现。函数EXPLODE(col)用于将Hive的一中复杂的数组或者Map结构拆分成多。LATERAL VIEW是用于和split、explode等函数一起使用的,可以将一展开成多或将多合并成一。 举个例子来说明,假设我们有一个表person_info,其中包含name、constellation和blood_type三。要将表中的name中的值拆分成多,可以使用LATERAL VIEW和EXPLODE函数,具体的语句如下: SELECT t1.base, concat_ws('|', collect_set(t1.name)) name FROM (SELECT name, concat(constellation, ",", blood_type) base FROM person_info) t1 GROUP BY t1.base; 在上述查询中,通过LATERAL VIEW和EXPLODE函数,将name拆分成多,然后使用concat_ws函数将拼接后的结果作为新的name。最后,通过GROUP BY对base分组。这样就实现了Hive中的操作。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [hive转行案例](https://download.csdn.net/download/weixin_38581777/14037437)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [hive操作(转行)](https://blog.csdn.net/aiduo3346/article/details/102085019)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值