hive中如何读取数组_如何从运算符获取Hive组中的数组/元素？

最新推荐文章于 2023-12-11 15:57:09 发布

weixin_39593593

最新推荐文章于 2023-12-11 15:57:09 发布

阅读量2.8k

点赞数

文章标签： hive中如何读取数组

本文链接：https://blog.csdn.net/weixin_39593593/article/details/112889000

版权

I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:-

Imagine a table named 'sample_table' with two columns as below:-

F1 F2

001 111

001 222

001 123

002 222

002 333

003 555

I want to write Hive Query that will give the below output:-

001 [111, 222, 123]

002 [222, 333]

003 [555]

In Pig, this can be very easily achieved by something like this:-

grouped_relation = GROUP sample_table BY F1;

Can somebody please suggest if there is a simple way to do so in Hive? What I can think of is to write a User Defined Function (UDF) for this but this may be a very time consuming option.

解决方案

The built in aggregate function collect_set (doumented here) gets you almost what you want. It would actually work on your example input:

SELECT F1, collect_set(F2)

FROM sample_table

GROUP BY F1

Unfortunately, it also removes duplicate elements and I imagine this isn't your desired behavior. I find it odd that collect_set exists, but no version to keep duplicates. Someone else apparently thought the same thing. It looks like the top and second answer there will give you the UDAF you need.

weixin_39593593

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive中如何读取数组_如何从运算符获取Hive组中的数组/元素？

I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:-Imagine a table named 'sample_table' with two columns as below:-F1 F2001 1...
复制链接

扫一扫