前言:
select发现count没有值,但是实际上表中有数据的异常情况,是由于二级索引表 与 主表不一致导致的。
主表是可变的,日常也是通过phoenix进行upsert的数据,其他几个类似的二级索引表都是正常的,只有这张二级索引表的早期数据缺失,目前原因不明(曾经历过: 停电、磁盘损坏文件丢失与转移)。
-- 走二级索引表查询,无数据
SELECT SERVER_DATE,COUNT(1) FROM T_ODS_XXX WHERE SERVER_DATE<'2020-01-10' GROUP BY SERVER_DATE;
+--------------+-----------+
| SERVER_DATE | COUNT(1) |
+--------------+-----------+
+--------------+-----------+
No rows selected (0.016 seconds)
-- 查询主表,有数据
SELECT /*+NO_INDEX*/ SERVER_DATE,COUNT(1) FROM T_ODS_XXX WHERE SERVER_DATE<'2020-01-10' GROUP BY SERVER_DATE;
+--------------+-----------+
| SERVER_DATE | COUNT(1) |
+--------------+-----------+
| 2020-01-01 | 99 |
| 2020-01-02 | 148 |
| 2020-01-03 | 956 |
| 2020-01-04 | 9313 |
| 2020-01-05 | 8667 |
| 2020-01-06 | 7117 |
| 2020-01-07 | 8634 |
| 2020-01-08 | 10411 |
| 2020-01-09 | 11478 |
+--------------+-----------+
9 rows selected (1.462 seconds)
-- app设置的二级索引表 IDX_T_ODS_XXX_SERVER_DATE_SHOW_ID,最早日期是 2020-03-27 开始,并且与主表存在不一致情况,从2020-03-28开始数据正常。
SELECT SERVER_DATE,COUNT(1) FROM T_ODS_XXX WHERE SERVER_DATE<'2020-04-01' GROUP BY SERVER_DATE;
+--------------+-----------+
| SERVER_DATE | COUNT(1) |
+--------------+-----------+
| 2020-03-27 | 1898 |
| 2020-03-28 | 19099 |
| 2020-03-29 | 16459 |
| 2020-03-30 | 16745 |
| 2020-03-31 | 19645 |
+--------------+-----------+
5 rows selected (0.241 seconds)
SELECT /*+NO_INDEX*/ SERVER_DATE,COUNT(1) FROM T_ODS_XXX WHERE SERVER_DATE>'2020-03-26' AND SERVER_DATE<'2020-04-01' GROUP BY SERVER_DATE;
+--------------+-----------+
| SERVER_DATE | COUNT(1) |
+--------------+-----------+
| 2020-03-27 | 20699 |
| 2020-03-28 | 19099 |
| 2020-03-29 | 16459 |
| 2020-03-30 | 16745 |
| 2020-03-31 | 19645 |
+--------------+-----------+
最后总结:
如何补充二级索引表缺失的数据?
解决方法:写一个程序,走主表自身row key,按照缺失日期分批获取数据,再次upsert进入主表,此时二级索引表就会刷新。