mysql collect_set_Hive sql 使用group by 字段被限制使用 collect_set/collect_list处理

hive> SELECT userid ,Sequnce ,ActionTime  FROM T_BZ_ClientActionLog GROUP BY Sequnce ,ActionTime limit 100;

FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'userid'

userid被要求也处在group by分组字段里面。

这个不同于mysql语句,mysql这样写是没有问题的。

以下是处理方法; hive> SELECT sequnce,actiontime,

collect_set(pagecode),collect_set(actioncode) FROM T_BZ GROUP BY Sequnce ,ActionTime limit 100;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1407387657227_0043, Tracking URL = http://n1.hadoop:8089/proxy/application_1407387657227_0043/

Kill Command = /app/prog/hadoop/bin/hadoop job  -kill job_1407387657227_0043

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2014-08-07 20:07:12,881 Stage-1 map = 0%,  reduce = 0%

2014-08-07 20:07:24,192 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.84 sec

2014-08-07 20:07:29,347 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 20.71 sec

MapReduce Total cumulative CPU time: 20 seconds 710 msec

Ended Job = job_1407387657227_0043

MapReduce Jobs Launched:

Job 0: Map: 1  Reduce: 1   Cumulative CPU: 20.71 sec   HDFS Read: 96397668 HDFS Write: 6969 SUCCESS

Total MapReduce CPU Time Spent: 20 seconds 710 msec

OK

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:20:33     []      ["A0001"]

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:20:37     ["P001"]        ["A0001"]

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:20:45     ["P003","P001"] ["A0002","A0001"]

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:07     ["P003"]        ["A0011"]

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:11     ["P003","P001"] ["A0017","A0001"]

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:13     ["P001","P002"] ["A0003","A0001"]

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:22     ["P002"]        ["A0006"]

可以看到结果的一个集合。

当然如果不想得到集合,可以这样写,获取集合的第一个元素::

hive> SELECT sequnce,actiontime,collect_set(pagecode)[0],collect_set(actioncode)[0] FROM T_BZ GROUP BY Sequnce ,ActionTime limit 100;

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:20:33             A0001

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:20:37     P001    A0001

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:20:45     P003    A0002

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:07     P003    A0011

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:11     P003    A0017

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:13     P001    A0003

00015a21-ef6d-4f05-b04e-ffd98fab2922    2014-07-24 01:21:22     P002    A0006

这样的结果就和mysql一致了。

当然如果不想去重还可以使用collect_list处理,这两个函数是HIVE的UDF函数。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值