前言
上篇文章讲述了 Region定位的过程,本篇继续,针对其中的几点,做一些细节方面的介绍。
介绍
一.从缓存中看看是否我们已经有该Region的getCachedLocation()方法
首先我们看下从缓存中看看是否我们已经有该Region的getCachedLocation()方法,代码如下:
/**
* Search the cache for a location that fits our table and row key.
* Return null if no suitable region is located.
* @return Null or region location found in cache.
*/
RegionLocations getCachedLocation(final TableName tableName,
final byte [] row) {
return metaCache.getCachedLocation(tableName, row);
}
它实际上是从metaCache中获取tableName和row所对应的Region位置信息。而metaCache是一个叫做MetaCache类的对象,它是为缓存Region位置信息,即Meta Data而专门设计的一个数据结构,我们先看下它的getCachedLocation()方法:
/**
* Search the cache for a location that fits our table and row key.
* Return null if no suitable region is located.
*
* @return Null or region location found in cache.
*/
public RegionLocations getCachedLocation(final TableName tableName, final byte [] row) {
ConcurrentNavigableMap<byte[], RegionLocations> tableLocations =
getTableLocations(tableName);
Entry<byte[], RegionLocations> e = tableLocations.floorEntry(row);
if (e == null) {
if (metrics != null) metrics.incrMetaCacheMiss();
return null;
}
RegionLocations possibleRegion = e.getValue();
// make sure that the end key is greater than the row we're looking
// for, otherwise the row actually belongs in the next region, not
// this one. the exception case is when the endkey is
// HConstants.EMPTY_END_ROW, signifying that the region we're
// checking is actually the last region in the table.
byte[] endKey = possibleRegion.getRegionLocation().getRegion().getEndKey();
// Here we do direct Bytes.compareTo and not doing CellComparator/MetaCellComparator path.
// MetaCellComparator is for comparing against data in META table which need special handling.
// Not doing that is ok for this case because
// 1. We are getting the Region location for the given row in non META tables only. The compare
// checks the given row is within the end key of the found region. So META regions are not
// coming in here.
// 2. Even if META region comes in, its end key will be empty byte[] and so Bytes.equals(endKey,
// HConstants.EMPTY_END_ROW) check itself will pass.
if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) ||
Bytes.compareTo(endKey, 0, endKey.length, row, 0, row.length) > 0) {
if (metrics != null) metrics.incrMetaCacheHit();
return possibleRegion;
}
// Passed all the way through, so we got nothing - complete cache miss
if (metrics != null) metrics.incrMetaCacheMiss();
return null;
}
这里要着重说下ConcurrentSkipListMap,它提供了一种线程安全的并发访问的排序映射表。内部是SkipList(跳表)结构实现,在理论上能够在O(log(n))时间内完成查找、插入、删除操作。而根据tableName获取缓存的该表相关的位置信息tableLocations的getTableLocations()方法,如下:
private ConcurrentSkipListMap<byte[], RegionLocations>
getTableLocations(final TableName tableName) {
// find the map of cached locations for this table
ConcurrentSkipListMap<byte[], RegionLocations> result;
// 从cachedRegionLocations中根据tableName获取缓存的该表相关的位置信息tableLocations,即一个保存的row到RegionLocations映射的跳表结构ConcurrentSkipListMap
// cachedRegionLocations也是一个ConcurrentHashMap,它是MetaCache中实现缓存Region位置信息功能所依靠的最主要的数据结构,
// 它存储的是{tableName->[row->RegionLocations]}的两级映射关系
result = this.cachedRegionLocations.get(tableName);
// if tableLocations for this table isn't built yet, make one
if (result == null) {// 如果result没有创建的话,创建一个
result = new ConcurrentSkipListMap<byte[], RegionLocations>(Bytes.BYTES_COMPARATOR);
// 将创建的result放入cachedRegionLocations,并获取旧值old
ConcurrentSkipListMap<byte[], RegionLocations> old =
this.cachedRegionLocations.putIfAbsent(tableName, result);
// 如果old不为空,直接返回old
if (old != null) {
return old;
}
}
// 返回result
return result;}
这里要强调的是MetaCache的一个成员变量:cachedRegionLocations,它的定义如下:
/**
* Map of table to table {@link HRegionLocation}s.
*/
private final ConcurrentMap<TableName, ConcurrentSkipListMap<byte[], RegionLocations>>
cachedRegionLocations =
new ConcurrentHashMap<TableName, ConcurrentSkipListMap<byte[], RegionLocations>>();
它是MetaCache中实现缓存Region位置信息功能所依靠的最主要的数据结构,它存储的是{tableName->[row->RegionLocations]}的两级映射关系。而MetaCache中还有一个涉及到所有Server的变量,如下:
// The presence of a server in the map implies it's likely that there is an
// entry in cachedRegionLocations that map to this server; but the absence
// of a server in this map guarentees that there is no entry in cache that
// maps to the absent server.
// The access to this attribute must be protected by a lock on cachedRegionLocations
private final Set<ServerName> cachedServers = new ConcurrentSkipListSet<ServerName>();
二、缓存获得的位置信息locations的cacheLocation()方法
下面我们看下缓存获得的位置信息locations的cacheLocation()方法,代码如下:
public void cacheLocation(final TableName tableName, final RegionLocations locations) {
// 从Region位置信息locations中获取Region对应的起始rowkey:startKey
byte [] startKey = locations.getRegionLocation().getRegionInfo().getStartKey();
// 调用getTableLocations()方法,根据表名tableName获取表的位置信息tableLocations
// 它是一个Region的起始rowkey,即startKey到RegionLocations的映射
ConcurrentMap<byte[], RegionLocations> tableLocations = getTableLocations(tableName);
// 将新得到的Region位置信息locations放入tableLocations,并且得到之前的Region位置信息oldLocation
RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, locations);
// 根据oldLocation是否为null判断是否为新缓存的一个条目
boolean isNewCacheEntry = (oldLocation == null);
if (isNewCacheEntry) {// 如果是新缓存的一个条目
if (LOG.isTraceEnabled()) {
LOG.trace("Cached location: " + locations);
}
// 调用addToCachedServers()方法,缓存出现的server,加入到cachedServers列表中
addToCachedServers(locations);
// 返回
return;
}
三、将Result转换为我们需要的RegionLocations,即regionInfoRow->locations
再看下将Result转换为需要的RegionLocations,即regionInfoRow->locations是如何处理的。它调用的是MetaTableAccessor的getRegionLocations()方法,代码如下:
public static RegionLocations getRegionLocations(final Result r) {
if (r == null) return null;
RegionInfo regionInfo = getRegionInfo(r, getRegionInfoColumn());
if (regionInfo == null) return null;
List<HRegionLocation> locations = new ArrayList<>(1);
NavigableMap<byte[],NavigableMap<byte[],byte[]>> familyMap = r.getNoVersionMap();
locations.add(getRegionLocation(r, regionInfo, 0));
NavigableMap<byte[], byte[]> infoMap = familyMap.get(getCatalogFamily());
if (infoMap == null) return new RegionLocations(locations);
// iterate until all serverName columns are seen
int replicaId = 0;
byte[] serverColumn = getServerColumn(replicaId);
SortedMap<byte[], byte[]> serverMap;
serverMap = infoMap.tailMap(serverColumn, false);
if (serverMap.isEmpty()) return new RegionLocations(locations);
for (Map.Entry<byte[], byte[]> entry : serverMap.entrySet()) {
replicaId = parseReplicaIdFromServerColumn(entry.getKey());
if (replicaId < 0) {
break;
}
HRegionLocation location = getRegionLocation(r, regionInfo, replicaId);
// In case the region replica is newly created, it's location might be null. We usually do not
// have HRL's in RegionLocations object with null ServerName. They are handled as null HRLs.
if (location.getServerName() == null) {
locations.add(null);
} else {
locations.add(location);
}
}
return new RegionLocations(locations);
}
重要的一点,从Result中获取Region信息HRegionInfo,getRegionInfoColumn()返回的为字符串"regioninfo"对应的byte[],也就是meta表中对应的qualifier,而family为"info",getHRegionInfo()和getRegionInfoColumn()方法如下:
protected static byte[] getRegionInfoColumn() {
return HConstants.REGIONINFO_QUALIFIER;}
private static HRegionInfo getHRegionInfo(final Result r, byte [] qualifier) {
// 获取单元格Cell,family为"info",qualifier为"regioninfo"
Cell cell = r.getColumnLatestCell(getFamily(), qualifier);
if (cell == null) return null;
// 调用HRegionInfo的parseFromOrNull()方法将Cell转换为HRegionInfo,
// 实际上就是反序列化,读出HRegionInfo需要的成员变量,比如startKey、endKey、regionId、regionName、split、offLine等
return HRegionInfo.parseFromOrNull(cell.getValueArray(),
cell.getValueOffset(), cell.getValueLength());
有两步骤:
1、获取单元格Cell,family为"info",qualifier为"regioninfo";
2、调用HRegionInfo的parseFromOrNull()方法将Cell转换为HRegionInfo,实际上就是反序列化,读出HRegionInfo需要的成员变量,比如startKey、endKey、regionId、regionName、split、offLine等。
四、当前线程休眠一段时间,再次重试,休眠的时间与pause和tries有关,越往后,停顿时间一般越长(波动时间除外)
最后,我们再看下当前线程休眠一段时间,再次重试,休眠的时间与pause和tries有关,越往后,停顿时间一般越长(波动时间除外)相关内容,代码如下:
public static long getPauseTime(final long pause, final int tries) {
int ntries = tries;
if (ntries >= HConstants.RETRY_BACKOFF.length) {
ntries = HConstants.RETRY_BACKOFF.length - 1;
}
if (ntries < 0) {
ntries = 0;
}
long normalPause = pause * HConstants.RETRY_BACKOFF[ntries];
// 1% possible jitter
long jitter = (long) (normalPause * ThreadLocalRandom.current().nextFloat() * 0.01f);
return normalPause + jitter;
}
基本上是越往后,休眠的时间越长,而pause是取参数hbase.client.pause,参数未配置的话,默认为100。
以上就是关于非Meta表,也就是业务表中row相关Region定位,实际上它还是要从Meta表中去查找的,Meta表的名字为hbase:meta,family为"info",而qualifier为"regioninfo",它也是HBase的一张表,如果从其中寻找数据的话,也是需要进行Region定位的,如果是meta表,直接调用locateMeta()方法进行定位,再来看下locateMeta()方法吧,代码如下:
private RegionLocations locateMeta(final TableName tableName,
boolean useCache, int replicaId) throws IOException {
// HBASE-10785: We cache the location of the META itself, so that we are not overloading
// zookeeper with one request for every region lookup. We cache the META with empty row
// key in MetaCache.
byte[] metaCacheKey = HConstants.EMPTY_START_ROW; // use byte[0] as the row for meta
RegionLocations locations = null;
if (useCache) {
locations = getCachedLocation(tableName, metaCacheKey);
if (locations != null && locations.getRegionLocation(replicaId) != null) {
return locations;
}
}
// only one thread should do the lookup.
synchronized (metaRegionLock) {
// Check the cache again for a hit in case some other thread made the
// same query while we were waiting on the lock.
if (useCache) {
locations = getCachedLocation(tableName, metaCacheKey);
if (locations != null && locations.getRegionLocation(replicaId) != null) {
return locations;
}
}
// Look up from zookeeper
locations = get(this.registry.getMetaRegionLocations());
if (locations != null) {
cacheLocation(tableName, locations);
}
}
return locations;
}
Meta表中Region的定位与非Meta表有很大不同,具体流程如下:
1、获得meta缓存的key,实际上为byte [0];
2、如果使用缓存的话,调用getCachedLocation()方法,定位Region位置,获得RegionLocations,即locations,如果locations不为空的话,说明缓存中存在对应数据,直接返回,否则继续往下执行,以定位Region位置;
3、使用synchronized关键字在metaRegionLock上加互斥锁,确保某一时刻只有一个线程在执行:
3.1、再次检查缓存,因为可能在当前线程等待对象metaRegionLock上互斥锁的时候,一些其它线程做相同的查询 ,已经将对应数据加载入缓存;
3.2、从zookeeper中寻找Meta数据;
3.3、定位到Region后,调用cacheLocation()方法放入缓存中,以备后续访问者可以直接从缓存中读取。
那Meta数据是如何从ZooKeeper中获取的,它是通过成员变量registry的getMetaRegionLocation()方法获取的,这个registry的初始化在HConnectionImplementation构造方法中如下:
this.registry = setupRegistry();
再看下这个setupRegistry()方法,代码如下:
private Registry setupRegistry() throws IOException {
return RegistryFactory.getRegistry(this);}
它调用的是RegistryFactory工厂类的静态方法getRegistry()来获得Registry实例的,继续往下看:
static Registry getRegistry(final Connection connection)
throws IOException {
// 获取类名registryClass,取参数hbase.client.registry.impl,参数未配置的话默认为ZooKeeperRegistry
String registryClass = connection.getConfiguration().get("hbase.client.registry.impl",
ZooKeeperRegistry.class.getName());
Registry registry = null;
try {
// 通过反射获得registryClass的实例registry
registry = (Registry)Class.forName(registryClass).newInstance();
} catch (Throwable t) {
throw new IOException(t);
}
// 调用init()方法初始化registry
registry.init(connection);
// 返回registry
return registry;
}
首先获取类名registryClass,取参数hbase.client.registry.impl,参数未配置的话默认为ZooKeeperRegistry,接着通过反射获得registryClass的实例registry,然后调用init()方法初始化registry,最后返回registry。而返回前的初始化操作也比较简单,如下:
@Override
public void init(Connection connection) {
if (!(connection instanceof ConnectionManager.HConnectionImplementation)) {
throw new RuntimeException("This registry depends on HConnectionImplementation");
}
this.hci = (ConnectionManager.HConnectionImplementation)connection;
}
先做connection的判断,看它是否是ConnectionManager.HConnectionImplementation实例,然后将其转化为ConnectionManager.HConnectionImplementation,并赋值给ZooKeeperRegistry的成员变量hci。
知道了registry是ZooKeeperRegistry的实例,就看下ZooKeeperRegistry的getMetaRegionLocation()方法,代码如下:
@Override
public RegionLocations getMetaRegionLocation() throws IOException {
// 从hci中获取ZooKeeper连接ZooKeeperKeepAliveConnection,即zkw
ZooKeeperKeepAliveConnection zkw = hci.getKeepAliveZooKeeperWatcher();
try {
if (LOG.isTraceEnabled()) {
LOG.trace("Looking up meta region location in ZK," + " connection=" + this);
}
// 获取ServerName:servername,通过MetaTableLocator实例的blockUntilAvailable()方法获取的
ServerName servername = new MetaTableLocator().blockUntilAvailable(zkw, hci.rpcTimeout);
if (LOG.isTraceEnabled()) {
LOG.trace("Looked up meta region location, connection=" + this +
"; serverName=" + ((servername == null) ? "null" : servername));
}
// servername为空的话,直接返回null
if (servername == null) return null;
// 构造HRegionLocation实例loc,
// 需要的参数包括:HRegionInfo.FIRST_META_REGIONINFO、上面获得的servername和默认为0的seqNum,
// HRegionInfo的FIRST_META_REGIONINFO实际上就是HRegionInfo的一个实例,其regionId为1L,TableName为TableName.META_TABLE_NAME
HRegionLocation loc = new HRegionLocation(HRegionInfo.FIRST_META_REGIONINFO, servername, 0);
// 利用loc构造RegionLocations,实际上RegionLocations中只包含这一个HRegionLocation
return new RegionLocations(new HRegionLocation[] {loc});
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return null;
} finally {
// 关闭zkw
zkw.close();
}
}
getMetaRegionLocation()方法处理流程如下:
1、先从hci中获取ZooKeeper连接ZooKeeperKeepAliveConnection,即zkw;
2、获取ServerName:servername,通过MetaTableLocator实例的blockUntilAvailable()方法获取的;
3、servername为空的话,直接返回null;
4、构造HRegionLocation实例loc,需要的参数包括:HRegionInfo.FIRST_META_REGIONINFO、上面获得的servername和默认为0的seqNum,HRegionInfo的FIRST_META_REGIONINFO实际上就是HRegionInfo的一个实例,其regionId为1L,TableName为TableName.META_TABLE_NAME;
5、利用loc构造RegionLocations,实际上RegionLocations中只包含这一个HRegionLocation,并返回。
有了ServerName、HRegionInfo,那么HRegionLocation就很容易获得了。那么ServerName是如何获取的呢?跟踪MetaTableLocator的blockUntilAvailable()方法,其中的关键代码为:
sn = getMetaRegionLocation(zkw);
if (sn != null || sw.elapsedMillis()
> timeout - HConstants.SOCKET_RETRY_WAIT_MS) {
break;
}
而getMetaRegionLocation()方法如下:
@Nullable
public ServerName getMetaRegionLocation(final ZooKeeperWatcher zkw) {
try {
RegionState state = getMetaRegionState(zkw);
return state.isOpened() ? state.getServerName() : null;
} catch (KeeperException ke) {
return null;
}
而getMetaRegionState()方法关键代码如下:
byte[] data = ZKUtil.getData(zkw, zkw.metaServerZNode);
它利用ZKUtil获取ZooKeeper上的metaServerZNode,而metaServerZNode的初始化如下:
metaServerZNode = ZKUtil.joinZNode(baseZNode,
conf.get("zookeeper.znode.metaserver", "meta-region-server"));
baseZNode取参数zookeeper.znode.parent,参数未配置则默认为/hbase,然后再取参数zookeeper.znode.metaserver,参数未配置则默认为meta-region-server。也就是说,默认情况下,metaserver在ZooKeeper上的位置为/hbase/meta-region-server。
总结
HBase是一个分布式数据库,对于数据的读写访问最终都是通过RowKey进行的,而要想获取数据,就必须通过指定TableName、Row来定位Region,而Region的定位,分为两种情况,一个是非Meta即用户表的定位,一个是Meta表的定位,而非Meta实际上是创建了一个ClientSmallReversedScanner从Meta表中查询的,Meta表在HBase中表名为hbase:meta,对应的family为info,qualifier为regioninfo,而问题最终归结到Meta表Region位置的定位,它是通过在ZooKeeper寻找/hbase/meta-region-server/路径下的信息来定位Meta的Server,然后通过构造一个regionId为1L,TableName为TableName.META_TABLE_NAME的HRegionInfo实例来确定Region位置信息RegionLocations的。