行<=>列转换
原始数据:
-- 数据:id course
1 java
1 hadoop
1 hive
1 hbase
2 java
2 hive
2 spark
2 flink
3 java
3 hadoop
3 hive
3 kafka
所需结果:
-- 结果如下(1表示选修,0表示未选修)
id java hadoop hive hbase spark flink kafka
1 1 1 1 1 0 0 0
2 1 0 1 0 1 1 0
3 1 1 1 0 0 0 1
解决思路: 使用case when;group by + sum,通过case when进行选修和未选修的判断,再通过sum在当前求和,通过group by 分组
hive (default)> select id,
> sum(case when course="java" then 1 else 0 end) as java,
> sum(case when course="hadoop" then 1 else 0 end) as hadoop,
> sum(case when course="hive" then 1 else 0 end) as hive,
> sum(case when course="hbase" then 1 else 0 end) as hbase,
> sum(case when course="spark" then 1 else 0 end) as spark,
> sum(case when course="flink" then 1 else 0 end) as flink,
> sum(case when course="kafka" then 1 else 0 end) as kafka
> from rowline1
> group by id;
Query ID = root_20221218153246_1df8e0f4-f57e-4f48-92fe-07bc86490bbb
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1671347868984_0002)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 9.32 s
----------------------------------------------------------------------------------------------
OK
id java hadoop hive hbase spark flink kafka
1 1 1 1 1 0 0 0
2 1 0 1 0 1 1 0
3 1 1 1 0 0 0 1
列转行(聚集)
原始数据。
id1 id2 flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
所需结果
id1 id2 flag
a b 2|1|3
c d 6|8
解决方案:
- 首先将flag聚集起来,然后再拼接在一起
hive (default)> select id1,id2,concat_ws("|",collect_set(cast (flag as string))) flag
> from rowline2
> group by id1,id2;
Query ID = root_20221218154421_20995988-3030-42f2-a207-286c9bb7ed9c
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1671347868984_0003)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.58 s
----------------------------------------------------------------------------------------------
OK
id1 id2 flag
a b 2|1|3
c d 6|8
Time taken: 6.041 seconds, Fetched: 3 row(s)