Hive使用TEZ作为默认的执行引擎,当表插入完记录后,count得到的结果为0,如果使用MR作为执行引擎来执行count,结果与实际记录数一致。
使用TEZ执行count十分高效,绕过了MapReduce操作,实际结果不正确,应该是TEZ内部有某种机制count()直接查询统计信息,然后统计信息不是最新的,导致count结果不正确。
hive> SELECT count(1) from ods.table1;
OK
0
Time taken: 2.283 seconds, Fetched: 1 row(s)
使用analyze命令对表重新更新统计信息并重新统计后结果正确:
hive> analyze table ods.table1 compute statistics;
Query ID = hadoop_20190621103221_1a46b8cb-75cd-486d-9b71-4e7d9c2a13f6
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1557916445942_13506)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 5.29 s
----------------------------------------------------------------------------------------------
OK
Time taken: 7.0 seconds
hive> SELECT count(1) from ods.table1;
OK
1221575
Time taken: 2.283 seconds, Fetched: 1 row(s)