HBase-split代码分析
相关类:
- SplitRequest : 具体执行split过程的类
- CompactSplitThread : compact split 线程控制
- MemStoreFlusher : flush memstore 实现
- RSRpcServices : regionserver RPC实现类
- TableLock : 是TableLockManager中的一个接口,实现类:ZKTableLockManager
- IncreasingToUpperBoundRegionSplitPolicy:默认的split策略
触发split的情况
- HBaseAdmin : HBaseAdmin.split
- compact : CompactSplitThread.CompactRunner
- memstore flush : FlushHandler.java
- compact 触发 split:
CompactionRunner.run()中:
public void run() {
//....一些条件判断
if (this.compaction == null) {
.....
// Finally we can compact something.
assert this.compaction != null;
...
try {
...
boolean completed = region.compact(compaction, store);
...
if (completed) {
// degenerate case: blocked regions require recursive enqueues
if (store.getCompactPriority() <= 0) {
requestSystemCompaction(region, store, "Recursive enqueue");
} else {
// see if the compaction has caused us to exceed max region size
//*********如果超过最大的region大小******
requestSplit(region);
}
}
} catch (IOException ex) {
...
}
server.checkFileSystem();
} catch (Exception ex) {
...
} finally {
LOG...
}
this.compaction.getRequest().afterExecute();//一个空的方法
}
上面的:store.getCompactPriority() <= 0 是什么意思??
我们来看一下HStore.java中 getCompactPriority()
@Override
public int getCompactPriority() {
// 从StoreFileManager中获取Compact Priority
int priority = this.storeEngine.getStoreFileManager().getStoreCompactionPriority();
if (priority == PRIORITY_USER) {
LOG.warn("Compaction priority is USER despite there being no user compaction");
}
return priority;
}
它转而从StoreFileManager中获取Compact Priority,继续吧!在StoreFileManager的默认实现DefaultStoreFileManager中,代码如下:
@Override
public int getStoreCompactionPriority() {
isTooManyStoreFiles:MemStore在进行flush时会判断HRegion上每个HStore下的文件数是否太多,太多则意味着MemStore的flush会被推迟进行,优先进行compact,否则文件数则会越来越多,而这里,离blockingFileCount越远,当前文件数越小的话,则意味着MemStore的flush可以优先进行,而compact可以在它flush之后再进行,将资源利用效率最大化
// BLOCKING_STOREFILES_KEY = "hbase.hstore.blockingStoreFiles"
// HStore.DEFAULT_BLOCKING_STOREFILE_COUNT = 7 为什么为7???
int blockingFileCount = conf.getInt(
HStore.BLOCKING_STOREFILES_KEY, HStore.DEFAULT_BLOCKING_STOREFILE_COUNT);
// 优先级为上述blockingFileCount减去当前storefiles的数目
int priority = blockingFileCount - storefiles.size();
// 如果priority为1,则返回2,否则返回原值
return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority;
}
回到 store.getCompactPriority() <= 0 这个问题
如果 store.getCompactPriority() <= 0 则 blockingFileCount(=7) - storefiles.size() <= 0
说明 storefiles.size() >=7
然后执行 requestSystemCompaction(region, store, “Recursive enqueue”);
这个里面又是需要执行CompactionRunner
requestSplit
好了,相当于 storefiles.size() < 7 的话
CompactionRunner.run()中执行requestSplit()
这个方法是CompactSplitThread中的requestSplit()
public synchronized boolean requestSplit(final HRegion r) {
// 1.shouldSplitRegion() 判断当前RS上region数量是否大于系统设置
// 2.r.getCompactPriority() >= 1
if (shouldSplitRegion() && r.getCompactPriority() >= Store.PRIORITY_USER) {
byte[] midKey = r.checkSplit();
if (midKey != null) {
requestSplit(r, midKey);
return true;
}
}
return false;
}
看一下shouldSplitRegion()方法里面做了什么判断?
private boolean shouldSplitRegion() {
//this.regionSplitLimit=conf.getInt(
//REGION_SERVER_REGION_SPLIT_LIMIT,
//DEFAULT_REGION_SERVER_REGION_SPLIT_LIMIT); 默认为1000
if(server.getNumberOfOnlineRegions() > 0.9*regionSplitLimit) {
//如果当前regionserver上的region数 > 900 打印WARN LOG
LOG.warn("Total number of regions is approaching the upper limit " + regionSplitLimit + ". "
+ "Please consider taking a look at http://hbase.apache.org/book.html#ops.regionmgt");
}
// regionSplitLimit 大于 当前RS的online region数则返回true
return (regionSplitLimit > server.getNumberOfOnlineRegions());
}
region在RS上的数量和compact优先级都判断完了
下面执行HRegion checkSplit()
/**
* Return the splitpoint. null indicates the region isn't splittable
* If the splitpoint isn't explicitly specified, it will go over the stores
* to find the best splitpoint. Currently the criteria of best splitpoint
* is based on the size of the store.
* 返回split point。null 表示不能被split。
* 如果split point 没有指定。则会根据stores寻找最佳split point 。最佳split point基于store的size
*/
public byte[] checkSplit() {
// META表和NAMESPACE元数据表不能被split
// recovering(恢复中)状态的表不能被split
//splitPolicy(split策略)默认为IncreasingToUpperBoundRegionSplitPolicy
if (!splitPolicy.shouldSplit()) {
return null;
}
//获取具体的split point
byte[] ret = splitPolicy.getSplitPoint();
if (ret != null) {
try {
//判断row是否在这个region当中
checkRow(ret, "calculated split");
} catch (IOException e) {
LOG.error("Ignoring invalid split", e);
return null;
}
}
return ret;
}
默认splitPolicy为:
IncreasingToUpperBoundRegionSplitPolicy
看一下它里面的shouldSplit()方法
@Override
protected boolean shouldSplit() {
if (region.shouldForceSplit()) return true;
boolean foundABigStore = false;
// Get count of regions that have the same common table as this.region
// table的region数量
int tableRegionsCount = getCountOfCommonTableRegions();
// Get size to check
// 获取根据hbase.hregion.max.filesize和region数量以及hbase.hregion.memstore.flush.size计算的CheckSize
long sizeToCheck = getSizeToCheck(tableRegionsCount);
//循环遍历region下面所有store
for (Store store : region.getStores().values()) {
// 如果有的region不能被split,比如有的region包含引用文件,则返回false
if ((!store.canSplit())) {
return false;
}
// Mark if any store is big enough
long size = store.getSize();
//如果store大于check size,设置foundABigStore为true
if (size > sizeToCheck) {
LOG.debug("ShouldSplit because " + store.getColumnFamilyName() +
" size=" + size + ", sizeToCheck=" + sizeToCheck +
", regionsWithCommonTable=" + tableRegionsCount);
foundABigStore = true;
}
}
return foundABigStore;
}
IncreasingToUpperBoundRegionSplitPolicy getSizeToCheck()
/**
* @return Region max size or <code>count of regions squared * flushsize, which ever is
* smaller; guard against there being zero regions on this server.
*/
protected long getSizeToCheck(final int tableRegionsCount) {
// safety check for 100 to avoid numerical overflow in extreme cases
//如果 region数=0或者>100 返回 hbase.hregion.max.filesize 值
//否则 在 max_filesize和 之间选择一个小的值 128M * regionCt * regionCt * regionCt
initialSize= table属性里设置的MEMSTORE_FLUSHSIZE,或者默认为hbase.hregion.memstore.flush.size(默认128M)
return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize():
Math.min(getDesiredMaxFileSize(),
this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);
}
getDesiredMaxFileSize()方法:
返回IncreasingToUpperBoundRegionSplitPolicy继承的ConstantSizeRegionSplitPolicy类的desiredMaxFileSize值.(hbase.hregion.max.filesize)
desiredMaxFileSize的赋值过程:
@Override
protected void configureForRegion(HRegion region) {
super.configureForRegion(region);
Configuration conf = getConf();
HTableDescriptor desc = region.getTableDesc();
if (desc != null) {
//如果table设置了MAX_FILESIZE属性,则返回这个属性的值,否则返回-1
this.desiredMaxFileSize = desc.getMaxFileSize();
}
//如果 desc.getMaxFileSize()返回 < 0 的值
//则获取hbase.hregion.max.filesize属性值,或者默认值:10 * 1024 * 1024 * 1024L=10G
if (this.desiredMaxFileSize <= 0) {
this.desiredMaxFileSize = conf.getLong(HConstants.HREGION_MAX_FILESIZE,
HConstants.DEFAULT_MAX_FILE_SIZE);
}
}
上面根据IncreasingToUpperBoundRegionSplitPolicy的shouldSplit()方法判断了:
region数量与max filesize 以及 当前region的store中是否包含引用文件等
下面我们继续看HRegion checkSplit()后面执行了什么:
splitPolicy.getSplitPoint()
IncreasingToUpperBoundRegionSplitPolicy getSplitPoint()
//IncreasingToUpperBoundRegionSplitPolicy getSplitPoint()
经过上面的各种check和get mid ,现在终于要执行requestSplit了
CompactSplitThread requestSplit()
public synchronized void requestSplit(final HRegion r, byte[] midKey) {
...
//this.splits 线程池,默认线程数为1
this.splits.execute(new SplitRequest(r, midKey, this.server));
...
}
看看SplitRequest的run()方法吧
@Override
public void run() {
//判断server是否停止服务
//split metric + 1
long startTime = EnvironmentEdgeManager.currentTime();
SplitTransaction st = new SplitTransaction(parent, midKey);
try {
//获取table的读锁TableLock类型
tableLock.acquire();
} catch (IOException ex) {
tableLock = null;
throw ex;
}
// If prepare does not return true, for some reason -- logged inside in
// the prepare call -- we are not ready to split just now. Just return.
//st.prepare()也是比较重要的一步
if (!st.prepare()) return;
try {
st.execute(this.server, this.server);
success = true;
} catch (Exception e) {
...
try {
//split失败,进行回滚操作
if (st.rollback(this.server, this.server)) {
...
} else {
...
}
} catch (RuntimeException ee) {
...
this.server.abort(msg + " -- Cause: " + ee.getMessage());
}
return;
}
} catch (IOException ex) {
...
server.checkFileSystem();
} finally {
//Coprocessor postCompleteSplit
if (parent.shouldForceSplit()) {
parent.clearSplit();
}
//释放TableLock
releaseTableLock();
// Split succ LOG
}//end finally
}
先看一下prepare():
new了A,B两个HRegionInfo
/**
* Does checks on split inputs.
* @return <code>true</code> if the region is splittable else
* <code>false</code> if it is not (e.g. its already closed, etc.).
*/
public boolean prepare() {
//parent region如果不能被split,则直接return false
//mid不能为null
HRegionInfo hri = this.parent.getRegionInfo();
parent.prepareToSplit();
// Check splitrow.
byte [] startKey = hri.getStartKey();
byte [] endKey = hri.getEndKey();
if (Bytes.equals(startKey, splitrow) ||
!this.parent.getRegionInfo().containsRow(splitrow)) {
LOG.info("Split row is not inside region key range or is equal to " +
"startkey: " + Bytes.toStringBinary(this.splitrow));
return false;
}
//构造regionId,如果构造的regionId小于parent regionId,则自动加1(保证在meta表中的顺序)
long rid = getDaughterRegionIdTimestamp(hri);
//创建A,B两个子region
this.hri_a = new HRegionInfo(hri.getTable(), startKey, this.splitrow, false, rid);
this.hri_b = new HRegionInfo(hri.getTable(), this.splitrow, endKey, false, rid);
this.journal.add(new JournalEntry(JournalEntryType.PREPARED));
return true;
}
看看executor
/**
* Run the transaction.
* @param server Hosting server instance. Can be null when testing
* @param services Used to online/offline regions.
* @throws IOException If thrown, transaction failed.
* Call {@link #rollback(Server, RegionServerServices)}
* @return Regions created
* @throws IOException
* @see #rollback(Server, RegionServerServices)
*/
public PairOfSameType<HRegion> execute(final Server server,
final RegionServerServices services)
throws IOException {
useZKForAssignment = server == null ? true :
ConfigUtil.useZKForAssignment(server.getConfiguration());
if (useCoordinatedStateManager(server)) {//状态判断
std =
((BaseCoordinatedStateManager) server.getCoordinatedStateManager())
.getSplitTransactionCoordination().getDefaultDetails();
}
PairOfSameType<HRegion> regions = createDaughters(server, services);
if (this.parent.getCoprocessorHost() != null) {
this.parent.getCoprocessorHost().preSplitAfterPONR();
}
return stepsAfterPONR(server, services, regions);
}
createDaughters()负责 下线parent region 上线子region
/**
* 准备region和region files
* 参数:services 用来上下线region
* 返回的是创建的region
*/
/* package */PairOfSameType<HRegion> createDaughters(final Server server,
final RegionServerServices services) throws IOException {
LOG.info("Starting split of region " + this.parent);
if ((server != null && server.isStopped()) ||
(services != null && services.isStopping())) {
throw new IOException("Server is stopped or stopping");
}
assert !this.parent.lock.writeLock().isHeldByCurrentThread():
"Unsafe to hold write lock while performing RPCs";
journal.add(new JournalEntry(JournalEntryType.BEFORE_PRE_SPLIT_HOOK));
// Coprocessor callback
if (this.parent.getCoprocessorHost() != null) {
// TODO: Remove one of these
this.parent.getCoprocessorHost().preSplit();
this.parent.getCoprocessorHost().preSplit(this.splitrow);
}
journal.add(new JournalEntry(JournalEntryType.AFTER_PRE_SPLIT_HOOK));
// If true, no cluster to write meta edits to or to update znodes in.
boolean testing = server == null? true:
server.getConfiguration().getBoolean("hbase.testing.nocluster", false);
this.fileSplitTimeout = testing ? this.fileSplitTimeout :
server.getConfiguration().getLong("hbase.regionserver.fileSplitTimeout",
this.fileSplitTimeout);
PairOfSameType<HRegion> daughterRegions = stepsBeforePONR(server, services, testing);
List<Mutation> metaEntries = new ArrayList<Mutation>();
if (this.parent.getCoprocessorHost() != null) {
if (this.parent.getCoprocessorHost().
preSplitBeforePONR(this.splitrow, metaEntries)) {
throw new IOException("Coprocessor bypassing region "
+ this.parent.getRegionNameAsString() + " split.");
}
try {
for (Mutation p : metaEntries) {
HRegionInfo.parseRegionName(p.getRow());
}
} catch (IOException e) {
LOG.error("Row key of mutation from coprossor is not parsable as region name."
+ "Mutations from coprocessor should only for hbase:meta table.");
throw e;
}
}
// This is the point of no return. Adding subsequent edits to .META. as we
// do below when we do the daughter opens adding each to .META. can fail in
// various interesting ways the most interesting of which is a timeout
// BUT the edits all go through (See HBASE-3872). IF we reach the PONR
// then subsequent failures need to crash out this regionserver; the
// server shutdown processing should be able to fix-up the incomplete split.
// The offlined parent will have the daughters as extra columns. If
// we leave the daughter regions in place and do not remove them when we
// crash out, then they will have their references to the parent in place
// still and the server shutdown fixup of .META. will point to these
// regions.
// We should add PONR JournalEntry before offlineParentInMeta,so even if
// OfflineParentInMeta timeout,this will cause regionserver exit,and then
// master ServerShutdownHandler will fix daughter & avoid data loss. (See
// HBase-4562).
this.journal.add(new JournalEntry(JournalEntryType.PONR));
// Edit parent in meta. Offlines parent region and adds splita and splitb
// as an atomic update. See HBASE-7721. This update to META makes the region
// will determine whether the region is split or not in case of failures.
// If it is successful, master will roll-forward, if not, master will rollback
// and assign the parent region.
//不是测试模式******************
if (!testing && useZKForAssignment) {
if (metaEntries == null || metaEntries.isEmpty()) {
MetaTableAccessor.splitRegion(server.getConnection(),
parent.getRegionInfo(), daughterRegions.getFirst().getRegionInfo(),
daughterRegions.getSecond().getRegionInfo(), server.getServerName(),
parent.getTableDesc().getRegionReplication());
} else {
//元数据的变化 下线parent,并且更新新的region信息
offlineParentInMetaAndputMetaEntries(server.getConnection(),
parent.getRegionInfo(), daughterRegions.getFirst().getRegionInfo(), daughterRegions
.getSecond().getRegionInfo(), server.getServerName(), metaEntries,
parent.getTableDesc().getRegionReplication());
}
} else if (services != null && !useZKForAssignment) {
if (!services.reportRegionStateTransition(TransitionCode.SPLIT_PONR,
parent.getRegionInfo(), hri_a, hri_b)) {
// Passed PONR, let SSH clean it up
throw new IOException("Failed to notify master that split passed PONR: "
+ parent.getRegionInfo().getRegionNameAsString());
}
}
return daughterRegions;
}
executor的最后
stepsAfterPONR(server, services, regions) open新region,修改zookeeper里面的信息
/hbase/region-in-transition