/**
* A split policy determines when a Region should be split.
*
* @see SteppingSplitPolicy Default split policy since 2.0.0
* @see IncreasingToUpperBoundRegionSplitPolicy Default split policy since
* 0.94.0
* @see ConstantSizeRegionSplitPolicy Default split policy before 0.94.0
*/
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
public abstract class RegionSplitPolicy extends Configured {
......................
}
@Override
protected boolean shouldSplit() {
boolean force = region.shouldForceSplit();
boolean foundABigStore = false;
for (HStore store : region.getStores()) {
// If any of the stores are unable to split (eg they contain reference files)
// then don't split
//看看有没有引用文件,如果有则不能split
if ((!store.canSplit())) {
return false;
}
// Mark if any store is big enough
//这里就是判断是否大于一个固定值,这里默认就是10G,大于这个值就进行Split,不过hbase有个参数hbase.hregion.max.filesize.jitter
//设置了一个抖动值,这里我没太理解明白,可暂时不考虑,直接认为达到一个固定值10G,
//进行了region的分裂
if (store.getSize() > desiredMaxFileSize) {
foundABigStore = true;
}
}
return foundABigStore || force;
}
@Override
protected boolean shouldSplit() {
boolean force = region.shouldForceSplit();
boolean foundABigStore = false;
// Get count of regions that have the same common table as this.region
int tableRegionsCount = getCountOfCommonTableRegions();
//这里就是是否分裂的关键判断逻辑
long sizeToCheck = getSizeToCheck(tableRegionsCount);
for (HStore store : region.getStores()) {
// If any of the stores is unable to split (eg they contain reference files)
// then don't split
if (!store.canSplit()) {
return false;
}
// Mark if any store is big enough
long size = store.getSize();
if (size > sizeToCheck) {
LOG.debug("ShouldSplit because " + store.getColumnFamilyName() +
" size=" + StringUtils.humanSize(size) +
", sizeToCheck=" + StringUtils.humanSize(sizeToCheck) +
", regionsWithCommonTable=" + tableRegionsCount);
foundABigStore = true;
}
}
return foundABigStore || force;
}
这里关键判断逻辑在函数getSizeToCheck中,代码如下:
/**
* @return Region max size or {@code count of regions cubed * 2 * flushsize},
* which ever is smaller; guard against there being zero regions on this server.
*/
protected long getSizeToCheck(final int tableRegionsCount) {
// safety check for 100 to avoid numerical overflow in extreme cases
return tableRegionsCount == 0 || tableRegionsCount > 100
? getDesiredMaxFileSize()
: Math.min(getDesiredMaxFileSize(),
initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);
}
这是一个三目运算,如果这个table中在线的region个数为0或则大于100,则使用getDesiredMaxFileSize()方法得到这个阀值,
否则就使用getDesiredMaxFileSize()得到的阀值和initialSize * (tableRegionsCount的三次方)中小的那一个。
函数getDesiredMaxFileSize获取的可以理解成参数hbase.hregion.max.filesize配置的大小,默认是10G,
为了更方便理解分裂的过程,这里举个例子:
比如我们hbase.hregion.max.filesize默认配置的是10G,要想达到10G分裂的标准,前期需要经过以下4次拆分过程:
第一次split:1^3 * 128*2= 256MB
第二次split:2^3 * 128*2= 2048MB
第三次split:3^3 * 128*2 = 6912MB
第四次split:4^3 * 128*2 = 16384MB > 10GB,因此取较小的值10GB
后面每次split的size都是10GB了。
4.SteppingSplitPolicy
2.0版本以后默认切分策略。这种切分策略的切分阈值又发生了变化,
相比IncreasingToUpperBoundRegionSplitPolicy简单了一些,
依然和待分裂region所属表在当前regionserver上的region个 数有关系,
如果只有1个Region的情况下,那第1次的拆分就是256M,
后续则按配置的拆分文件大小(hbase.hregion.max.filesize默认值是10G)作为Region拆分标准。
在IncreasingToUpperBoundRegionSplitPolicy策略中,针对大表的拆分表现很不错,
但是针对小表会产生过多的Region,SteppingSplitPolicy则将小表的Region控制在一个合理的范围,对大表的拆分也不影响。
SteppingSplitPolicy是IncreasingToUpperBoundRegionSplitPolicy的子类,
其总共源码只有几行,如下:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.hbase.regionserver;
import org.apache.yetus.audience.InterfaceAudience;
@InterfaceAudience.Private
public class SteppingSplitPolicy extends IncreasingToUpperBoundRegionSplitPolicy {
/**
* @return flushSize * 2 if there's exactly one region of the table in question
* found on this regionserver. Otherwise max file size.
* This allows a table to spread quickly across servers, while avoiding creating
* too many regions.
*/
@Override
protected long getSizeToCheck(final int tableRegionsCount) {
return tableRegionsCount == 1 ? this.initialSize : getDesiredMaxFileSize();
}
}