Impala SQL常见报错问题排查与解决记录

背景

最近在全权负责的一款数据产品,基于Hive-JDBC执行用户提交的Impala SQL,实现数据集,数据监控,数据推送等功能。Hive-JDBC版本2.1.1-cdh6.2.1

<dependency>
	<groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>2.1.1-cdh6.2.1</version>
</dependency>

报错

ParseException: Syntax error in line 34:sum(enable)Encountered: ENABLE.Expected: ALL…

之前CDH版本执行无误的SQL,在CDH集群升级到7.1.7版本后执行失败,具体报错信息如下:ParseException: Syntax error in line 34:\nsum(enable) as 辅助\n ^\nEncountered: ENABLE\nExpected: ALL, CASE, CAST, DATE, DEFAULT, DISTINCT, EXISTS, FALSE, IF, INTERVAL, LEFT, NOT, NULL, REPLACE, RIGHT, TRUNCATE, TRUE, IDENTIFIER\n\nCAUSED BY: Exception: Syntax error
即,enable是一个敏感关键词,报语法错误。

解决方法:加上反单引号,即:

sum(`enable`)

注:在SQL第一行加set ENABLE_EXPR_REWRITES = 0;不能解决问题。

关于set ENABLE_EXPR_REWRITES = 0;,请参考set ENABLE_EXPR_REWRITES = 0及GROUP BY expression must not contain aggregate functions等问题解决

default.url_decode() unknown for database default. Currently this db has 4 functions.

SQL里面包含url_decode()函数,SQL执行失败。如图,非hive-jdbc问题,在可视化查询界面执行也是失败的:
在这里插入图片描述

TransmitData() to ip:27000 failed: Network error: Client connection negotiation failed: client connection to ip:27000: connect: Connection refused (error 111)

节点TCP通信问题,手动重试。参考TransmitData 27000 Network-error

Failed to close HDFS file.The DiskSpace quota of is exceeded

具体的报错信息如下:

Failed to close HDFS file: hdfs://ppdhdpha/user/hive/warehouse/lkzc.db/jh_100701_cdd_noapply_cutdown_user/_impala_insert_staging//.0.parq\\nError(255): Unknown error 255\\nRoot cause: RemoteException: The DiskSpace quota of /user/hive/warehouse/lkzc.db is exceeded: quota = 65970697666560 B = 60 TB but diskspace consumed = 65971183008777 B = 60.00 TB

lkzc业务线的库空间满,无法继续写入数据,见上面的报错语句里_impala_insert_staging提示。

解决方案:删除无用的大表,释放空间。

The library hdfs:/udf/hive.udf-1.0.2.jar last modified time does not match the expected last modified time. Run ‘refresh functions <db name>’.

某个SQL通过hive-jdbc提交到大数据平台,报错如下:

java.lang.Exception: java.sql.SQLException: The library hdfs:///udf/hive.udf-1.0.2.jar last modified time 1670929674 does not match the expected last modified time 1653377949. Run 'refresh functions <db name>'.

很莫名其妙的报错。偶现。再次执行没有问题。

Failed to close HDFS file.Unable to close file because the last blockBP does not have enough number of replicas.

报错信息:

java.sql.SQLException: Failed to close HDFS file: hdfs:/user/hive/warehouse/lkzc.db/jh_061201_ffhywkt_chuda/_impala_insert_staging/
Error(255): Unknown error 255
Root cause: IOException: Unable to close file because the last blockBP does not have enough number of replicas.

Failed due to unreachable impalad(s): hadoopcbd008156.ppdgdsl.com:2200

SQL提交到大数据impala集群某个节点,而这个节点刚好有点网络异常等原因,导致unreachable。
暂时没有比较好的解决方法:

  1. 手动retry,再提交一次SQL,大概率(?)就不会到这个节点;
  2. 程序增加自动retry逻辑,针对特定类型的报错才有retry逻辑;retry次数不能过多,比如1~2次即可。因为平台用户提交的SQL多达上w条,需要考虑用户SQL自动重试后对impala集群资源的使用率问题

org.apache.hive.service.cli.HiveSQLException: Invalid or unknown query handle

SQLException: Cancelled

具体报错信息如下:

java.sql.SQLException: Cancelled
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
	at com.alibaba.druid.pool.DruidPooledStatement.executeQuery(DruidPooledStatement.java:308)

通过hive-jdbc提交到大数据平台的SQL执行报错:Cancelled,没有更多报错日志。原因:大数据平台节点资源紧张,直接抛弃任务。
区别于下面的Cancelled from Impala’s debug web interface报错。
解决方法:重试,优化SQL。貌似没有更好的方法。

Cancelled in ScannerContext

Cancelled from Impala’s debug web interface

具体报错信息如下:

java.sql.SQLException: Cancelled from Impala's debug web interface
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:356)
	at com.alibaba.druid.pool.DruidPooledResultSet.next(DruidPooledResultSet.java:68)

解决方法:重试,优化SQL。貌似没有更好的方法。

Invalid query handle: xxxx.yyyy

报错信息如下:

ERROR c.a.druid.pool.DruidPooledStatement - clearResultSet error
org.apache.hive.service.cli.HiveSQLException: Invalid query handle: d84d9133d8a6ce9c:9a77cd100000000
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:266)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:252)
	at org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:210)
	at org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:221)
	at org.apache.hive.jdbc.HiveQueryResultSet.close(HiveQueryResultSet.java:308)
	at com.alibaba.druid.pool.DruidPooledResultSet.close(DruidPooledResultSet.java:86)
	at com.alibaba.druid.pool.DruidPooledStatement.clearResultSet(DruidPooledStatement.java:206)
	at com.alibaba.druid.pool.DruidPooledStatement.close(DruidPooledStatement.java:514)

报错代码片段:

finally {
    if (ps != null) {
        ps.close();
    }
    if (con != null) {
        con.close();
    }
}

发生在Statement.close()处。

任务超时被kill。大数据平台资源有限,不可能让用户的查询SQL无限期执行下去。任务查杀规则综合考虑3方面因素(查询数据量 + 执行时间 + 完成度):
在这里插入图片描述
针对数据块这一问题,非常不建议使用select *,而列出具体的查询字段,另外加上limit 10000字句限制条数。

SQLException: Sender timed out waiting for receiver fragment instance: , dest node: 66

具体的报错信息如下:

java.sql.SQLException: Sender timed out waiting for receiver fragment instance: 394c696029ddcce6:a51b7cab000007cc, dest node: 66
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
	at com.alibaba.druid.pool.DruidPooledStatement.executeQuery(DruidPooledStatement.java:308)

有时候重试成功,有时候失败。登录到Hadoop集群机器里,查看任务调度执行日志:
在这里插入图片描述
发现这个SQL,居然要全表扫描一张1.2W+分区,扫描20.1TB数据。

解决方法:重试,优化SQL。貌似没有更好的方法。

Failed to open HDFS file.No such file or directory.Root cause: RemoteException: File does not exist:

任务执行失败,具体的报错信息如下:

java.sql.SQLException: Disk I/O error: Failed to open HDFS file hdfs://ppdhdpha/user/hive/warehouse/test.db/chengzhangquanyi_huolizhiguoqi_chuda/2c43254ab60211d3-cf0e47d200000235_298950249_data.0.
Error(2): No such file or directory
Root cause: RemoteException: File does not exist: /user/hive/warehouse/test.db/chengzhangquanyi_huolizhiguoqi_chuda/2c43254ab60211d3-cf0e47d200000235_298950249_data.0.
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

报错原因:数据表不存在。尝试方法:refresh tableinvalidate table

Error(s) deleting partition directories. First error (of 37) was: Hdfs op. Input/output error

java.sql.SQLException: Error(s) deleting partition directories. First error (of 37) was: Hdfs op (DELETE hdfs://ppdhdpha/user/hive/warehouse/cszc.db/zmj_sop_m0_base_snp/1d460f3a4d87ea14-a4c4521100000091_870332460_data.0.) failed, error was: hdfs://ppdhdpha/user/hive/warehouse/cszc.db/zmj_sop_m0_base_snp/1d460f3a4d87ea14-a4c4521100000091_870332460_data.0.
Error(5): Input/output error

解决方法:重试,优化SQL。貌似没有更好的方法。

java.sql.SQLException: Admission for query exceeded timeout 180000ms in pool root.report. Queued reason: queue is not empty (size 5); queued queries are executed first.

具体报错信息:

ERROR c.alibaba.druid.pool.DruidDataSource - create connection SQLException, url: jdbc:hive2://111.222.333.66:21050/default, errorCode 0, state HY000
java.sql.SQLException: Admission for query exceeded timeout 180000ms in pool root.report. Queued reason: queue is not empty (size 4); queued queries are executed first.
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
	at com.alibaba.druid.pool.DruidAbstractDataSource.validateConnection(DruidAbstractDataSource.java:1434)
	at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1733)
	at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2838)

提交到大数据平台的SQL需要排队,排队超过3分钟被抛弃。

GetConnectionTimeoutException & Rejected query from pool root.report: queue full, limit=200, num_queued=200.

具体报错信息:

java.lang.Exception: java.lang.Exception: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 120000, active 1, maxActive 150, creating 0, createErrorCount 2
        at com.xy.cloudiview.common.util.JdbcUtil.getConnection(JdbcUtil.java:136)
        at com.xy.cloudiview.datasetsubscript.business.service.impl.TableWarnServiceImpl.getSqlCount(TableWarnServiceImpl.java:244)
        at com.xy.cloudiview.datasetsubscript.business.service.impl.TableWarnServiceImpl.executeTableWarnJob(TableWarnServiceImpl.java:66)
        at com.xy.cloudiview.datasetsubscript.business.xxljob.IviewTableWarnJobHandler.execute(IviewTableWarnJobHandler.java:45)
        at com.ppdai.job.core.thread.JobThread.run(JobThread.java:142)
Caused by: java.lang.Exception: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 120000, active 1, maxActive 150, creating 0, createErrorCount 2
        at com.xy.cloudiview.common.util.JdbcUtil.getConnection(JdbcUtil.java:122)
        ... 4 common frames omitted
Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 120000, active 1, maxActive 150, creating 0, createErrorCount 2
        at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1773)
        at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1427)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1407)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1397)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:100)
        at com.xy.cloudiview.common.util.JdbcUtil.getConnection(JdbcUtil.java:118)
        ... 4 common frames omitted
Caused by: java.sql.SQLException: Rejected query from pool root.report: queue full, limit=200, num_queued=200.
        at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
        at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
        at com.alibaba.druid.pool.DruidAbstractDataSource.validateConnection(DruidAbstractDataSource.java:1434)
        at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1733)
        at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2838)

大数据平台设置SQL提交数量,最多运行150个,排队200个,排队超时时间为5分钟。后续提交的SQL,排队连接超时,报错。

ImpalaRuntimeException: Error making ‘updateTableColumnStatistics’ RPC to Hive Metastore: CAUSED BY: MetaException: Could not create “increment”/“table” value-generation container SEQUENCE_TABLE since autoCreate flags do not allow it.

ImpalaRuntimeException: Error making ‘dropTable’ RPC to Hive Metastore: \nCAUSED BY: MetaException: One or more instances could not be deleted

Failed to close HDFS file: hdfs://ppdhdpha/user/hive/warehouse/xkyy.db/cdjk_apibg_willa/_impala_insert_staging/.0.parq\nError(255): Unknown error 255\nRoot cause: RemoteException: The DiskSpace quota of /user/hive/warehouse/xkyy.db is exceeded: quota = 43980465111040 B = 40 TB but diskspace consumed = 43981119305382 B = 40.00 TB

File ‘hdfs://ppdcdpha/user/hive/warehouse/usrt.db/willa_liubiao_daily’ has an invalid Parquet version number: 336\n\n. Please check that it is a valid Parquet file. This error can also occur due to stale metadata. If you believe this is a valid Parquet file, try running refresh usrt.willa_liubiao_daily

表不存在

有两类报错信息:

  1. Table does not exist:
  1. Could not resolve table reference

区别是什么???

Decompressor: invalid compressed length. Data is likely corrupt.

待执行的查询SQL依赖于表rhino_man_checked_hitrule,而表的存储文件损坏,需要重新生成表。

Expected to read a compressed text file of size 5553487 bytes. But only read 5401212 bytes. This may indicate data file corruption. (file: hdfs://hdpha/user/hive/warehouse/rpt.db/rhino_man_checked_hitrule/000000_0.snappy).

待执行的查询SQL依赖于表rhino_man_checked_hitrule,而表的存储文件损坏,需要重新生成表。

参考

Sender timed out waiting for receiver fragment instance: , dest node: 66
Invalid query handle,感觉借鉴意义不大

  • 2
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
手把手视频详细讲解项目开发全过程,需要的小伙伴自行百度网盘下载,链接见附件,永久有效。 课程简介 从零开始讲解大数据分布式计算的发展及Impala的应用场景,对比Hive、MapReduce、Spark等类似框架讲解内存式计算原理,基于Impala构建高性能交互式SQL分析平台 课程亮点 1,知识体系完备,从小白到大神各阶段读者均能学有所获。 2,生动形象,化繁为简,讲解通俗易懂。 3,结合工作实践及分析应用,培养解决实际问题的能力。 4,每一块知识点, 都有配套案例, 学习不再迷茫。 适用人群 1、对大数据感兴趣的在校生及应届毕业生。 2、对目前职业有进一步提升要求,希望从事大数据行业高薪工作的在职人员。 3、对大数据行业感兴趣的相关人员。 课程内容 第一章:内存式计算发展 1.分布式计算的发展 2.大数据分布式计算分类 3.Impala内存式计算诞生 第二章:Impala原理初探 1.Impala的设计思想 2.Impala与Hive之间的联系 3.Impala的分布式架构详解 4.Impala角色概念详解 第三章:基于Cloudera镜像部署分布式Impala 1.基于CDH5.14构建本地Yum镜像 2.企业级分布式Impala部署 3.企业级配置与Hadoop集成 4.企业级配置与Hive集成 5.主从架构及元数据服务管理 第四章:Impala企业实战开发案例 1.基于企业案例实现Impala集群管理 2.Impala最全SQL语法详解 3.实战开发Impala数据库与表管理 4.基于分析案例实现Impala数据管理 5.Impala与应用系统集成JDBC 第五章:Impala原理深入 1.Impala各角色功能详解 2.Impala任务提交原理 3.Impala元数据同步原理

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

johnny233

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值