1.hbase集群 本地测试(用java完成的相关操作):
四台虚拟机 做成2个hmaster ,3个HregionServer 既实现高可用也充分利用四台虚拟机的分布式性能,
需要依赖:
(1)zookeeper 做hbase集群的维护
(2)hdfs 做存储引擎 一个namenode 四个 datanode
2.分区键的设计:(做5年的数据规划)
原始数据 一条 60k的磁盘存储 (深圳是57k) ,5秒钟产生一条
1年的数据 60K * 12 * 60 * 24 3012 = 373,248,000k 约等于360G
每条数据有三个副本,预估五年的存储 :3 * 5 * 360G = 5400G
留30%的buffer 共7200G
Hbase 一个region存放 30G的数据 所以预分区做成 7200/30 = 240 个分区
分区键定位定为 000| 001| 002| … 237| 238|
3.分区号设计 (rowkey设计)
根据年月日的 HASH值 和分区数 取模 作为rowkey的前缀,这样保证一天的数据在一个region里面,(批量查询的时候)用下划线”” 拼接 ,后面放 年月日时分秒
String yearMothDay= “20200101”; //年月日
int region=Math.abs((yearMothDay).hashCode())%240;
String time=“20200102121010”; //年月日时分秒
String key = region + “” + time
4,插入数据存在一个列族下面,可以每天00:30 存前一天的数据 ,测试批量插入秒级
5,读取数据的时候一个scan只能取出一天的数据,所以取出多天的数据需要做好工具类;取出1万条数据 15秒左右 ,服务器上会快很多
Hbase-site.xml 暂时的配置
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdp-01:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hdp-01,hdp-02,hdp-03</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/module/data/hbase/tmp</value>
</property>
<property>
<name>hbase.master</name>
<value>hdp-01:60000</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/module/data/zookeeper/zkdata</value>
</property>
<property>
<!--htable.setWriteBufferSize(5242880);//5M -->
<name>hbase.client.write.buffer</name>
<value>5242880</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>300</value>
<description>
Count of RPC Listener instances spun up on RegionServers.Same property is used by the Master for count of master handlers.
</description>
</property>
<property>
<name>hbase.table.sanity.checks</name>
<value>false</value>
</property>
<property>
<!--every 30s,the master will check regionser is working -->
<name>zookeeper.session.timeout</name>
<value>30000</value>
</property>
<property>
<!--every region max file size set to 30G -->
<name>hbase.hregion.max.filesize</name>
<value>32212254720</value>
</property>
<property>
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hbase.regionserver.region.split.policy</name>
<value>org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy</value>
</property>
<property>
<name>hbase.regionserver.optionalcacheflushinterval</name>
<value>7200000</value>
<description>
Maximum amount of time an edit lives in memory before being automatically
flushed.
Default 1 hour. Set it to 0 to disable automatic flushing.</description>
</property>
<property>
<name>hfile.block.cache.size</name>
<value>0.3</value>
<description>Percentage of maximum heap (-Xmx setting) to allocate to
block cache
used by HFile/StoreFile. Default of 0.4 means allocate 40%.
Set to 0 to disable but it's not recommended; you need at least
enough cache to hold the storefile indices.</description>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>52428800</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size</name>
<value>0.5</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size.lower.limit</name>
<value>0.5</value>
</property>
<property>
<name>dfs.clienhbase.hregion.max.filesizet.socket-timeout</name>
<value>600000</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>10</value>
</property>
<property>
<name>hbase.regionserver.hlog.splitlog.writer.threads</name>
<value>10</value>
</property>
<property>
<name>hbase.hstore.compaction.min</name>
<value>8</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.small</name>
<value>5</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.large</name>
<value>8</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>900000</value>
</property>