环境:
Hive: 2.7.7
Oracle SQL Developer
Cloudera JDBC Driver
案例:
select type, count( object_id ) as objects
from tblobj2
group by type
order by objects desc
结果
其他聚合与 SQL Server 有些类似:
avg()
avg(distinct)
count()
count(distinct)
sum()
sum(distinct)
案例 - 2 :
select schema_id,count(object_id) as rows from tblobj2 group by schema_id ;
select percentile(schema_id,0.64) from tblobj2 ;
select percentile(schema_id,0.60) from tblobj2 ;
这是一个奇特的例子。percentile 求解的是在一组值中,某个符合一定比例范围之内的最小值。
已知,schema_id 有两个值,1,4, 分别占了 5/8, 3/8.
当指定要找到 60% 的字段 schema_Id 值都小于某个特定的值,用 percentile(schema_id,0.60) 来计算这个特定的值。
而我们通过占比得出,5/8 的值都小于等于 1, 一旦超过 5/8 则需要求得 4,才满足 5/8 以上的值,比如 6/8(0.64), 7/8 等小于等于这个特定值。
异常:
1 在 hive 里面输入一下命令是可以正常运行的:
select type, count(distinct object_id ) as objects
from tblobj2
group by type
2 在 Oracle SQL Developer 中输入命令,却意外的报错了:
[Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: ERROR_STATE, SQL state: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
, Query: SELECT `crm_tblobj2`.`type`, COUNT(DISTINCT `crm_tblobj2`.`object_id`) `objects` FROM `crm`.`tblobj2` `crm_tblobj2` GROUP BY `crm_tblobj2`.`type`.
严重怀疑是因为 hive --service cli 上线之后,把所有的表对象都加锁了。重启 hive server 之后就可以继续了。