hive结合hbase数据处理解决方案测评二（优化篇）

最新推荐文章于 2023-04-15 09:15:00 发布

原创

最新推荐文章于 2023-04-15 09:15:00 发布 · 1.5k 阅读

0 ·

CC 4.0 BY-SA版权

本文探讨了Hive结合HBase的数据处理解决方案，重点关注查询效率的优化。通过调整HBase参数如`hbase.regionserver.handler.count`, `hbase.client.scanner.caching`等，实现了查询性能的提升。然而，Hive与HBase表的统计计算性能仍低于纯Hive表，且随着数据量增长，性能问题更加明显。提出了部分字段切表和考虑数据冷热存储的优化策略，同时指出对于复杂的SQL统计计算，需要进一步的研究和测试。" 132330599,7144079,使用Boost::endian模块进行字节序转换,"['C++', '算法', '开发语言']

接上一篇，对hbase参数进行优化，主要是调整与查询效率相关的参数
count

select count(1) from hbase_table;

部分字段切表

insert overwrite table hive_table select a,b,c,d from hbase_table;

全字段切表

insert into table test_table partition(part='aa') select * from hbase_table;

hive至hive切表

create table test_table2 like test_table;
insert into table test_table2 partition(part) select * from test_table;

优化修改参数

<property>
    <name>hbase.regionserver.handler.count</name>
    <value>100</value>
    <description>Count of RPC Listener instances spun up on RegionServers.
    Same property is used by the Master for count of master handlers.
    Default