详单表采用分区表(外部表), 在查询时,引擎用的impala,偶尔会出现卡住的情况,影响用户体验。 正常情况3-5秒出数据,卡住就不好说了,有时120秒,有时150秒,,,,,,,,,,,
有3种方案:
1.当天的数据就让其实时入库, 昨天以及之前的数据,分表存储(迁移到另外的历史表中),实现冷热分离。查询较多的,主要是当天, 意思就是,当天的数据不管读写分不分离,最多17个小时(客户17点下班) ,元数据小,都不会出现卡住的情况。真的是这样? 查询的结果请limit 4000
2.入库用临时表(另一同事的方案),入库完成后,把数据迁移到正式表,定时迁移(弊端:实时数据,不一定能及时查询到), 并且正式表中数据很多, 元数据大,能不能解决问题? 要测试哦
3.入库用临时表,入库完成后(1小时为间隔),把数据迁移到正式表,正式表按周存储(保障每个表只存7天)。“遇7就分”的原则设计表。
下面是测试表的拆分
1.建立历史表 test2
CREATE TABLE xdr.ims_call_cdr_test2 (
callid STRING,
starttime BIGINT,
caller STRING,
called STRING,
calltype STRING,
srcip STRING,
dstip STRING,
ringtime STRING,
answertime STRING,
delay_ring STRING,
delay_answer STRING,
duration_talk STRING,
callerflag STRING,
result STRING,
userconnected STRING,
netconnected STRING,
callee STRING,
caller_area STRING,
called_area STRING,
direction STRING,
reldirection STRING,
charging STRING,
medtype STRING,
medcodingtype STRING,
onhooktime STRING,
duration STRING,
callerroam STRING,
calledroam STRING,
callerprefix STRING,
calledprefix STRING
)
PARTITIONED BY (
day INT,
minute CHAR(4)
);
2. 查看正式表 test1
[brd@slave61 cluster_keytab]$ hadoop fs -du -h /user/hive/warehouse/xdr.db/ims_call_cdr_test1
471.2 M 942.3 M /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171122
468.5 M 937.0 M /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171123
455.3 M 910.6 M /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171124
259.0 M 518.1 M /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171126
3. 迁移(mv)数据
hadoop fs -mv /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171123 /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171123
hadoop fs -mv /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171124 /user/hive/warehouse/xdr.db/ims_call_cdr_test2/day=20171124
hadoop fs -mv /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171126 /user/hive/warehouse/xdr.db/ims_call_cdr_test2/day=20171126
4.查看历史表 test2 ,是否有数据了?
[brd@slave61 cluster_keytab]$ hadoop fs -du -h /user/hive/warehouse/xdr.db/ims_call_cdr_test2
468.5 M 937.0 M /user/hive/warehouse/xdr.db/ims_call_cdr_test2/day=20171123
455.3 M 910.6 M /user/hive/warehouse/xdr.db/ims_call_cdr_test2/day=20171124
259.0 M 518.1 M /user/hive/warehouse/xdr.db/ims_call_cdr_test2/day=20171126
5.查看正式表的数据test1,是否被mv了 ?
[brd@slave61 cluster_keytab]$ hadoop fs -du -h /user/hive/warehouse/xdr.db/ims_call_cdr_test1
471.2 M 942.3 M /user/hive/warehouse/xdr.db/ims_call_cdr_test1/day=20171122
6.把历史表test2 增加分区,就可以查询了,无需刷新
alter table xdr.ims_call_cdr_test2 add IF NOT EXISTS partition(day=20171123,minute=cast('1200' as char(4)));
alter table xdr.ims_call_cdr_test2 add IF NOT EXISTS partition(day=20171124,minute=cast('1200' as char(4)));
alter table xdr.ims_call_cdr_test2 add IF NOT EXISTS partition(day=20171126,minute=cast('1200' as char(4)));
7. 有数据了,
select count(*) from xdr.ims_call_cdr_test2 where day=20171124 and minute='1200';