MemStore flush触发条件
flush操作是Memstore最核心的操作,针对Memstore的flush操作进行深入地解析:首先分析HBase在哪些场景下会触发flush,然后结合源代码分析整个flush的操作流程,最后再重点整理总结和flush相关的配置参数,这些参数对于性能调优、运维中的问题定位都非常重要。
1、触发条件
HBase会在如下几种情况下触发flush操作, 需要注意的是MemStore的最小flush单元是HRegion而不是单个MemStore。可想而知,如果一个HRegion中Memstore过多,每次flush的开销必然会很大,因此我们也建议在进行表设计的时候尽量减少ColumnFamily的个数。
HBase官方文档总结的刷写时机有6种:
1) 手动执行flush:用户通过shell命令 flush ‘tablename’ 或者 flush ‘regionname’分别对表或一个region进行flush。(org.apache.Hadoop.hbase.client.HBaseAdmin调用flush操作实现,该操作会直接触发HRegion的internalFlush)
2)Memstore级别限制:当Region中任意一个MemStore的大小达到了上限(hbase.hregion.memstore.flush.size,默认128MB),会触发Memstore刷新。 (沒找到)
3)Region级别限制:当Region中所有Memstore的大小总和达到了上限(hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size,默认 4 * 128M = 512M),会触发memstore刷新。 (执行更新操作前,checkresource操作)
4)Region Server级别限制:当一个Region Server中所有Memstore的大小总和达到了上限(hbase.regionserver.global.memstore.upperLimit * hbase_heapsize,默认 40%的JVM内存使用量),会触发部分Memstore刷新。Flush顺序是按照Memstore由大到小执行,先Flush Memstore最大的Region,再执行次大的,直至总体Memstore内存使用量低于阈值(hbase.regionserver.global.memstore.lowerLimit * hbase_heapsize,默认 38%的JVM内存使用量)。(flush为空,执行RS级别检查)
5)当一个Region Server中HLog数量达到上限(可通过参数hbase.regionserver.max.logs配置)时,系统会选取最早的一个 HLog对应的一个或多个Region进行flush。
6)HBase定期刷新Memstore:默认周期为1小时,确保Memstore不会长时间没有持久化。为避免所有的MemStore在同一时间都进行flush导致的问题,定期的flush操作有20000左右的随机延时。
7)数据更新操作引起,如put/delete等
以下从源码(2.0.1版本)角度,对上述几种触发机制分析。
2、触发机制分析
2.1 put操作
第一个会在hbase的put/update/delete时候发生,首先会调用checkResources()方法检查资源,这个checkResources()实际上就是检查HRegion的MemStore大小是否超过一定的阈值(hbase.hregion.memstore.flush.size),如果超过,则会调用requestFlush()方法发起对该HRegion的MemStore进行flush的请求,并抛出RegionTooBusyException异常,阻止该操作继续,后续将要讲的Delete、update等数据更新操作也是如此,在开始执行操作前都会调用这个checkResources()方法来检查资源。而requestFlush方法核心的方法即是调用HRegion的flushcache方法。
以put操作为例,HRegion#put:
public void put(Put put) throws IOException {
...
// 检查是否满足flush的条件
checkResources();
startRegionOperation(Operation.PUT);
try {
// All edits for the given row (across all column families) must happen atomically.
doBatchMutate(put);
} finally {
closeRegionOperation(Operation.PUT);
}
}
void checkResources() throws RegionTooBusyException {
// If catalog region, do not impose resource constraints or block updates.
if (this.getRegionInfo().isMetaRegion()) return;
MemStoreSize mss = this.memStoreSizing.getMemStoreSize();
if (mss.getHeapSize() + mss.getOffHeapSize() > this.blockingMemStoreSize) {
// 如果当前region上的memstore的值大于 128M * 4,对当前region强制发起flush
blockedRequestsCount.increment();
requestFlush();
...
}
}
// HRegion初始化,默认为 128M * 4
this.blockingMemStoreSize = this.memstoreFlushSize * mult;
调用HRegion的requestFlush方法
private void requestFlush() {
if (this.rsServices == null) {
return;
}
requestFlush0(FlushLifeCycleTracker.DUMMY);
}
private void requestFlush0(FlushLifeCycleTracker tracker) {
boolean shouldFlush = false;
synchronized (writestate) { //检查状态是为了避免重复请求
if (!this.writestate.isFlushRequested()) {
shouldFlush = true;
writestate.flushRequested = true;
}
}
if (shouldFlush) {
// Make request outside of synchronize block; HBASE-818.
// 通过rsServices请求flush
this.rsServices.getFlushRequester().requestFlush(this, false, tracker);
if (LOG.isDebugEnabled()) {
LOG.debug("Flush requested on " + this.getRegionInfo().getEncodedName());
}
} else {
tracker.notExecuted("Flush already requested on " + this);
}
}
其中rsServices向RegionServer发起一个RPC请求,getFlushRequester()用于获取RegionServer中的成员变量cacheFlusher,该变量是MemStoreFlusher类型,用于管理该RegionServer上的各种flush请求,它里面定义的几个关键变量如下:
// BlockingQueue阻塞队列 DelayQueue使用优先级队列实现的无界阻塞队列
private final BlockingQueue<FlushQueueEntry> flushQueue = new DelayQueue<>();
private final Map<Region, FlushRegionEntry> regionsInQueue = new HashMap<>();
// 原子bool
private AtomicBoolean wakeupPending = new AtomicBoolean();
private final long threadWakeFrequency;
// HRegionServer实例
private final HRegionServer server;
private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
// blockSignal定义在这里是作为一个信号量么
private final Object blockSignal = new Object();
// HRegion的一个阻塞更新的等待时间
private long blockingWaitTime;
private final LongAdder updatesBlockedMsHighWater = new LongAdder();
private final FlushHandler[] flushHandlers;
private List<FlushRequestListener> flushRequestListeners = new ArrayList<>(1);
private FlushType flushType;
调用MemStoreFlusher#requestFlush
// 将待flush的region放入待处理队列
public void requestFlush(HRegion r, boolean forceFlushAllStores, FlushLifeCycleTracker tracker) {
r.incrementFlushesQueuedCount();
synchronized (regionsInQueue) {
if (!regionsInQueue.containsKey(r)) {
// This entry has no delay so it will be added at the top of the flush
// queue. It'll come out near immediately.
FlushRegionEntry fqe = new FlushRegionEntry(r, forceFlushAllStores, tracker);
// flushQueue是一个无界阻塞队列,属于flush的工作队列,而regionsInQueue则用于保存位于flush队列的region的信息。
this.regionsInQueue.put(r, fqe);
this.flushQueue.add(fqe);
} else {
tracker.notExecuted("Flush already requested on " + r);
}
}
}
至此,flush任务已经放入了工作队列,等待flush线程的处理。
2.2 日志数量达到上限
wal下的AbstractFSWAL类构造函数定义日志最大数量,并根据数量判断是否需要执行 flush操作
protected AbstractFSWAL(...) throws FailedLogCloseException, IOException {
...
boolean maxLogsDefined = conf.get("hbase.regionserver.maxlogs") != null;
if (maxLogsDefined) {
LOG.warn("'hbase.regionserver.maxlogs' was deprecated.");
}
this.maxLogs = conf.getInt("hbase.regionserver.maxlogs",
Math.max(32, calculateMaxLogFiles(conf, logrollsize)));
...
}
byte[][] findRegionsToForceFlush() throws IOException {
byte[][] regions = null;
int logCount = getNumRolledLogFiles();
if (logCount > this.maxLogs && logCount > 0) {
Map.Entry<Path, WalProps> firstWALEntry = this.walFile2Props.firstEntry();
regions =
this.sequenceIdAccounting.findLower(firstWALEntry.getValue().encodedName2HighestSequenceId);
}
...
return regions;
}
2.3 定时flush操作
HRegionServer类中有一个内部类PeriodicMemStoreFlusher,定时去检查对应的region以及RegionServer的memstore是否到达了阈值然后去刷写。
static class PeriodicMemStoreFlusher extends ScheduledChore {
final HRegionServer server;
// 每次flush的操作有20000ms左右的延迟(0~5分钟),避免所有memstore同时刷新,对硬盘造成压力
final static int RANGE_OF_DELAY = 5 * 60 * 1000; // 5 min in milliseconds
final static int MIN_DELAY_TIME = 0; // millisec
public PeriodicMemStoreFlusher(int cacheFlushInterval, final HRegionServer server) {
super("MemstoreFlusherChore", server, cacheFlushInterval);
this.server = server;
}
@Override
protected void chore() {
final StringBuilder whyFlush = new StringBuilder();
for (HRegion r : this.server.onlineRegions.values()) {
if (r == null) continue;
if (r.shouldFlush(whyFlush)) {
FlushRequester requester = server.getFlushRequester();
if (requester != null) {
...
requester.requestDelayedFlush(r, randomDelay, false);
}
}
}
}
}
其中,shouldFlush根据时间间隔判断是否需要flush。
HRegion#shouldFlush
boolean shouldFlush(final StringBuilder whyFlush) {
whyFlush.setLength(0);
// This is a rough measure.
if (this.maxFlushedSeqId > 0
&& (this.maxFlushedSeqId + this.flushPerChanges < this.mvcc.getReadPoint())) {
whyFlush.append("more than max edits, " + this.flushPerChanges + ", since last flush");
return true;
}
long modifiedFlushCheckInterval = flushCheckInterval;
if (getRegionInfo().getTable().isSystemTable() &&
getRegionInfo().getReplicaId() == RegionInfo.DEFAULT_REPLICA_ID) {
modifiedFlushCheckInterval = SYSTEM_CACHE_FLUSH_INTERVAL;
}
if (modifiedFlushCheckInterval <= 0) { //disabled
return false;
}
long now = EnvironmentEdgeManager.currentTime();
//if we flushed in the recent past, we don't need to do again now
if ((now - getEarliestFlushTimeForAllStores() < modifiedFlushCheckInterval)) {
return false;
}
//since we didn't flush in the recent past, flush now if certain conditions
//are met. Return true on first such memstore hit.
for (HStore s : stores.values()) {
if (s.timeOfOldestEdit() < now - modifiedFlushCheckInterval) {
// we have an old enough edit in the memstore, flush
whyFlush.append(s.toString() + " has an old edit so flush to free WALs");
return true;
}
}
return false;
}
server.getFlushRequester生成flush请求
public FlushRequester getFlushRequester() {
return this.cacheFlusher;
}
protected MemStoreFlusher cacheFlusher;
2.4 手动flush操作
HBaseAdmin中包含flush相关指令,包括表级、region级和region server级别。
public void flush(final TableName tableName) throws IOException {
checkTableExists(tableName);
if (isTableDisabled(tableName)) {
LOG.info("Table is disabled: " + tableName.getNameAsString());
return;
}
execProcedure("flush-table-proc", tableName.getNameAsString(), new HashMap<>());
}
@Override
public void flushRegion(final byte[] regionName) throws IOException {
Pair<RegionInfo, ServerName> regionServerPair = getRegion(regionName);
if (regionServerPair == null) {
throw new IllegalArgumentException("Unknown regionname: " + Bytes.toStringBinary(regionName));
}
if (regionServerPair.getSecond() == null) {
throw new NoServerForRegionException(Bytes.toStringBinary(regionName));
}
final RegionInfo regionInfo = regionServerPair.getFirst();
ServerName serverName = regionServerPair.getSecond();
flush(this.connection.getAdmin(serverName), regionInfo);
}
private void flush(AdminService.BlockingInterface admin, final RegionInfo info)
throws IOException {
ProtobufUtil.call(() -> {
// TODO: There is no timeout on this controller. Set one!
HBaseRpcController controller = rpcControllerFactory.newController();
FlushRegionRequest request =
RequestConverter.buildFlushRegionRequest(info.getRegionName());
admin.flushRegion(controller, request);
return null;
});
}
@Override
public void flushRegionServer(ServerName serverName) throws IOException {
for (RegionInfo region : getRegions(serverName)) {
flush(this.connection.getAdmin(serverName), region);
}
}
以上,是1、3、5、6、7几种触发条件的源码分析。
第4种触发机制,将在下一篇博客中,和生成的待flush的队列的处理流程一起分析。
参考: