1. 问题现象
1.1 spark thrift server查询hive分区表异常
0: jdbc:hive2://10.0.194.131:10017> select * from xy_app.xy_log_collect_user_funnels limit 10;
Error: java.lang.RuntimeException: serious problem (state=,code=0)
0: jdbc:hive2://10.0.194.131:10017> [hdfs@VM_194_131_centos /usr/local/xyhadoop]$
1.2 presto查询同个hive分区表也异常
presto> select * from xy_app.xy_log_collect_user_funnels limit 10;
Query 20200519_080407_03244_tmamh, FAILED, 1 node
Splits: 9 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]Query 20200519_080407_03244_tmamh failed: Partition location does not exist: hdfs://xydwns/user/hive/warehouse/xy_app.db/xy_log_collect_user_funnels/xy_date=20191007/project_id=pay_h5
presto查询的时候提示上当不存在
1.3 hive执行查询,跟pesto提示一致
hive> alter table xy_app.xy_log_collect_user_funnels drop if exists PARTITION(xy_date='20191007');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Failed to delete parent: File does not exist: /user/hive/warehouse/xy_app.db/xy_log_collect_user_funnels/xy_date=20191007
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2414)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4653)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1101)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:566)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:895)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
看来spark thrift server做得不够好,错了,也不告诉你错在哪里
1.4 执行hdfs dfs -ls命令查看确实不存在
hdfs dfs -ls /user/hive/warehouse/xy_app.db/xy_log_collect_user_funnels/xy_date=20191007
ls: `/user/hive/warehouse/xy_app.db/xy_log_collect_user_funnels/xy_date=20191007': No such file or directory
1.5 查看分区,xy_date=20191007却又存在
0: jdbc:hive2://10.0.194.131:10017> show partitions xy_app.xy_log_collect_user_funnels;
+-------------------------------------------------+--+
| partition |
+-------------------------------------------------+--+
| xy_date=20190304/project_id=invest |……
| xy_date=20191006/project_id=cardloan |
| xy_date=20191006/project_id=cardloan_h5 |
| xy_date=20191006/project_id=pay_h5 |
| xy_date=20191006/project_id=test |
| xy_date=20191006/project_id=wallet |
| xy_date=20191006/project_id=wallet_h5 |
| xy_date=20191007/project_id=cardloan |
| xy_date=20191007/project_id=cardloan_h5 |
| xy_date=20191007/project_id=pay_h5 |
| xy_date=20191007/project_id=test |
| xy_date=20191007/project_id=wallet |
| xy_date=20191007/project_id=wallet_h5 |
+-------------------------------------------------+--+
2. 原因分析
hive分区表元数据和hdfs文件目录不一致,导致的读异常。
3. 解决方案
3.1 尝试一:修复分区试试,还是异常
0: jdbc:hive2://10.0.194.131:10017> msck repair table xy_app.xy_log_collect_user_funnels;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (17.727 seconds)
0: jdbc:hive2://10.0.194.131:10017> select * from xy_app.xy_log_collect_user_funnels limit 10;
Error: java.lang.RuntimeException: serious problem (state=,code=0)
3.2 尝试二:删除分区,然后再执行修复分区命令,结果发现删除分区失败
0: jdbc:hive2://10.0.194.131:10017> alter table xy_app.xy_log_collect_user_funnels drop PARTITION(xy_date='20191007');
Error: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: File does not exist: /user/hive/warehouse/xy_app.db/xy_log_collect_user_funnels/xy_date=20191007
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2414)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4653)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1101)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:566)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:895)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
; (state=,code=0)
3.3 尝试三: 既然无法删除多余分区,那就创建多余分区对应的目录,再删除多余分区
创建多余分区对应的hdfs目录
hdfs dfs -mkdir /user/hive/warehouse/xy_app.db/xy_log_collect_user_funnels/xy_date=20191007
再删除多余分区
hive> alter table xy_app.xy_log_collect_user_funnels drop if exists PARTITION(xy_date='20191007');
OK
Time taken: 0.085 seconds
最后执行查询,成功!
hive> select visit_id from xy_app.xy_log_collect_user_funnels limit 10;
OK
833ED40B-E899-466B-A459-4834E2F315FB
20F5BB1F-5107-4557-A83D-0C5B8FA5AED8
5637D35D-3855-46F7-A4CB-E5AFAEC206F9
AFD17530-0C1F-4C90-AC01-A8CBB2910854
FD9E2680-3800-4470-B741-69755173A908
E7188AEC-8119-4C71-A7B4-AA7FD5CB95C4
55224461-14C5-40C4-92D5-6DE0CAE32F6E
F3158B3D-BDBB-410C-9E2C-CEAEAE18AD23
32CEA784-8B5D-4CBA-8C5D-97831D7DB7DF
CB98781B-A9A1-4110-AE5B-B617B6B1CB0E
Time taken: 2.047 seconds, Fetched: 10 row(s)
一般情况,尝试一就可以解决问题了,不行就尝试二,基本都能搞定问题。尝试三是不得已而为之,也不知道谁对这个表怎样做了不该做操作。