日常工作中,为了让数据的可读性更强,经常会对数据格式进行转化操作。总结一下日常工作中遇到的关于行列操作问题。
单行拆分成多行
--创建测试数据
CREATE TABLE fwj.customer
(
id STRING,
name STRING,
mobiles STRING);
INSERT INTO fwj.customer
SELECT '1','jim','139,177,158' FROM system.dual;
-- 解法一,不建议选择
SELECT a.id,a.name,substr(a.mobiles,1,3) mobiles FROM fwj.customer a
UNION ALL
SELECT a.id,a.name,substr(a.mobiles,5,3) mobiles FROM fwj.customer a
UNION ALL
SELECT a.id,a.name,substr(a.mobiles,9,3) mobiles FROM fwj.customer a
-- 解法二
SELECT b.id,a.mobiles,b.name FROM (
SELECT explode(split(t.mobiles,',')) mobiles FROM fwj.customer t )a , fwj.customer b;
-- 优化,但不行。explode 这类UDTF函数不支持和其他字段一块被select。
SELECT t.id,t.name,explode(split(t.mobiles,',')) mobiles FROM fwj.customer t
-- 解法三,可以理解成在一次查询中
-- 先生成了一个视图 mob 包含了行转列后的数据,之后从mob 中取出转换后的数据,
-- 其他字段仍旧从原表中取。
SELECT a.id,a.name,mob.mobile
FROM fwj.customer a lateral view explode(split(a.mobiles,',')) mob AS mobile;
split(str,sep):
该函数的作用是拆分指定分隔符分割的字符串,返回一个列表。
SELECT split(a.mobiles,',') FROM fwj.customer a;
result:
["139","177","158"]
explode(arr):
该函数是一个表生成函数。输入一个列表参数,将列表中的每个值都转换为一行。
SELECT explode(plit(a.mobiles,',')) mobiles FROM fwj.customer a;
result:
mobiles
139
177
158
行列互换
测试集
/*创建数据库测试表*/
CREATE TABLE [Scores]
(
[ID] INT IDENTITY(1, 1) PRIMARY KEY ,
[Student] VARCHAR(20) ,
[Subject] VARCHAR(30) ,
[Score] FLOAT
)
/*插入数据库测试数据信息*/
INSERT INTO Scores
( Student, Subject, Score )
VALUES ( 'test001', '语文', '90' )
INSERT INTO Scores
( Student, Subject, Score )
VALUES ( 'test001', '英语', '85' )
INSERT INTO Scores
( Student, Subject, Score )
VALUES ( 'text002', '语文', '90' )
INSERT INTO Scores
( Student, Subject, Score )
VALUES ( 'text002', '英语', '80' )
INSERT INTO Scores
( Student, Subject, Score )
VALUES ( 'test003', '语文', '95' )
INSERT INTO Scores
( Student, Subject, Score )
VALUES ( 'test003', '英语', '85' )
1. case when …then else …end 用法,行列转换
SELECT Student AS '姓名' ,
MAX(CASE Subject
WHEN '语文' THEN Score
ELSE 0
END) AS '语文' ,--如果这个行是“语文”,就选此行作为列
MAX(CASE Subject
WHEN '英语' THEN Score
ELSE 0
END) AS '英语'
FROM Scores
GROUP BY Student
ORDER BY Student
2. pivot(聚合函数(要转成列值的列名) for 要转换的列 in(目标列名)
SELECT Student AS '姓名' ,
AVG(语文) AS '语文' ,
AVG(英语) AS '英语'
FROM Scores PIVOT( AVG(Score) FOR Subject IN ( 语文, 英语 ) )as NewScores
GROUP BY Student
ORDER BY Student ASC
拓展题
将上表转化为下表的格式
-- 解法一,实用性不如解法二,但可能图二无法满足,推荐使用解法二
select t1.item,t1.[time]as startdate,min(t2.[time])as endtime
from t1,t1 as t2
where t1.type='start'
and t1.type<>t2.type
and t2.[time]>t1.[time]
group by t1.item,t1.[time]
-- 解法二
Select * From T1;
With S1 As (Select SeqNo, Item,Type,IIf(Type='Start', Time,Null) sTime,
IIf(Type='End', Time,Null) eTime,
Lag(Type,1,'0') Over (Partition By Item Order By Time) LastType From T1 ),
S2 As ( Select SeqNo,Item,Type,sTime,eTime,LastType,
Sum(IIF(Type='Start',1,IIF(Type='End' And LastType='Start',0,1)))
Over (Partition By Item Order By SeqNo) Gn From S1 )
Select Item,Gn,Max(sTime) sTime,Max(eTime) eTime From S2 Group By Item,Gn
解法二计算过程:
1.Time 的类型 设置为 Time(0)
2.S2代码思路:累加 分组 编号 …,避免万一 中间 没有(漏掉了) Start 记录, 却有 End 记录 的情况。如果是 Start 直接加1 ,如果是End 看下 前面是不是 Start ,是 就不加了,如果是第1条记录, 还是加 1。