Hbase源码分析（七）Region定位（下）2021SC@SDUSC

最新推荐文章于 2022-07-08 12:10:54 发布

珍珠没有奶茶^_^

最新推荐文章于 2022-07-08 12:10:54 发布

阅读量1.7k

点赞数

文章标签： hbase 数据库 database

本文链接：https://blog.csdn.net/qq_45856546/article/details/121209823

版权

文章目录

前言
介绍
总结

前言

上篇文章讲述了 Region定位的过程，本篇继续，针对其中的几点，做一些细节方面的介绍。

介绍

一.从缓存中看看是否我们已经有该Region的getCachedLocation()方法
首先我们看下从缓存中看看是否我们已经有该Region的getCachedLocation()方法，代码如下：

/**
   * Search the cache for a location that fits our table and row key.
   * Return null if no suitable region is located.
   * @return Null or region location found in cache.
   */
  RegionLocations getCachedLocation(final TableName tableName,
      final byte [] row) {
    return metaCache.getCachedLocation(tableName, row);
  }

它实际上是从metaCache中获取tableName和row所对应的Region位置信息。而metaCache是一个叫做MetaCache类的对象，它是为缓存Region位置信息，即Meta Data而专门设计的一个数据结构，我们先看下它的getCachedLocation()方法：

/**
   * Search the cache for a location that fits our table and row key.
   * Return null if no suitable region is located.
   *
   * @return Null or region location found in cache.
   */
  public RegionLocations getCachedLocation(final TableName tableName, final byte [] row) {
    ConcurrentNavigableMap<byte[], RegionLocations> tableLocations =
      getTableLocations(tableName);

    Entry<byte[], RegionLocations> e = tableLocations.floorEntry(row);
    if (e == null) {
      if (metrics != null) metrics.incrMetaCacheMiss();
      return null;
    }
    RegionLocations possibleRegion = e.getValue();

    // make sure that the end key is greater than the row we're looking
    // for, otherwise the row actually belongs in the next region, not
    // this one. the exception case is when the endkey is
    // HConstants.EMPTY_END_ROW, signifying that the region we're
    // checking is actually the last region in the table.
    byte[] endKey = possibleRegion.getRegionLocation().getRegion().getEndKey();
    // Here we do direct Bytes.compareTo and not doing CellComparator/MetaCellComparator path.
    // MetaCellComparator is for comparing against data in META table which need special handling.
    // Not doing that is ok for this case because
    // 1. We are getting the Region location for the given row in non META tables only. The compare
    // checks the given row is within the end key of the found region. So META regions are not
    // coming in here.
    // 2. Even if META region comes in, its end key will be empty byte[] and so Bytes.equals(endKey,
    // HConstants.EMPTY_END_ROW) check itself will pass.
    if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) ||
        Bytes.compareTo(endKey, 0, endKey.length, row, 0, row.length) > 0) {
      if (metrics != null) metrics.incrMetaCacheHit();
      return possibleRegion;
    }

    // Passed all the way through, so we got nothing - complete cache miss
    if (metrics != null) metrics.incrMetaCacheMiss();
    return null;
  }

这里要着重说下ConcurrentSkipListMap，它提供了一种线程安全的并发访问的排序映射表。内部是SkipList（跳表）结构实现，在理论上能够在O(log(n))时间内完成查找、插入、删除操作。而根据tableName获取缓存的该表相关的位置信息tableLocations的getTableLocations()方法，如下：

private ConcurrentSkipListMap<byte[], RegionLocations>
    getTableLocations(final TableName tableName) {
    // find the map of cached locations for this table
    ConcurrentSkipListMap<byte[], RegionLocations> result;
    
    // 从cachedRegionLocations中根据tableName获取缓存的该表相关的位置信息tableLocations，即一个保存的row到RegionLocations映射的跳表结构ConcurrentSkipListMap
    // cachedRegionLocations也是一个ConcurrentHashMap，它是MetaCache中实现缓存Region位置信息功能所依靠的最主要的数据结构，
    // 它存储的是{tableName->[row->RegionLocations]}的两级映射关系
    result = this.cachedRegionLocations.get(tableName);
    // if tableLocations for this table isn't built yet, make one
    if (result == null) {// 如果result没有创建的话，创建一个
      result = new ConcurrentSkipListMap<byte[], RegionLocations>(Bytes.BYTES_COMPARATOR);
      
      // 将创建的result放入cachedRegionLocations，并获取旧值old
      ConcurrentSkipListMap<byte[], RegionLocations> old =
          this.cachedRegionLocations.putIfAbsent(tableName, result);
      
      // 如果old不为空，直接返回old
      if (old != null) {
        return old;
      }
    }
    
    // 返回result
    return result;}

这里要强调的是MetaCache的一个成员变量：cachedRegionLocations，它的定义如下：

  /**
   * Map of table to table {@link HRegionLocation}s.
   */
  private final ConcurrentMap<TableName, ConcurrentSkipListMap<byte[], RegionLocations>>
  cachedRegionLocations =
  new ConcurrentHashMap<TableName, ConcurrentSkipListMap<byte[], RegionLocations>>();

它是MetaCache中实现缓存Region位置信息功能所依靠的最主要的数据结构，它存储的是{tableName->[row->RegionLocations]}的两级映射关系。而MetaCache中还有一个涉及到所有Server的变量，如下：

 // The presence of a server in the map implies it's likely that there is an
  // entry in cachedRegionLocations that map to this server; but the absence
  // of a server in this map guarentees that there is no entry in cache that
  // maps to the absent server.
  // The access to this attribute must be protected by a lock on cachedRegionLocations
  private final Set<ServerName> cachedServers = new ConcurrentSkipListSet<ServerName>();

二、缓存获得的位置信息locations的cacheLocation()方法

下面我们看下缓存获得的位置信息locations的cacheLocation()方法，代码如下：

public void cacheLocation(final TableName tableName, final RegionLocations locations) {
    
	// 从Region位置信息locations中获取Region对应的起始rowkey：startKey
	byte [] startKey = locations.getRegionLocation().getRegionInfo().getStartKey();
    
	// 调用getTableLocations()方法，根据表名tableName获取表的位置信息tableLocations
	// 它是一个Region的起始rowkey，即startKey到RegionLocations的映射
	ConcurrentMap<byte[], RegionLocations> tableLocations = getTableLocations(tableName);
    
	// 将新得到的Region位置信息locations放入tableLocations，并且得到之前的Region位置信息oldLocation
	RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, locations);
    
	// 根据oldLocation是否为null判断是否为新缓存的一个条目
	boolean isNewCacheEntry = (oldLocation == null);
    if (isNewCacheEntry) {// 如果是新缓存的一个条目
      if (LOG.isTraceEnabled()) {
        LOG.trace("Cached location: " + locations);
      }
      
      // 调用addToCachedServers()方法，缓存出现的server，加入到cachedServers列表中
      addToCachedServers(locations);
      
      // 返回
      return;
    }

三、将Result转换为我们需要的RegionLocations，即regionInfoRow->locations

再看下将Result转换为需要的RegionLocations，即regionInfoRow->locations是如何处理的。它调用的是MetaTableAccessor的getRegionLocations()方法，代码如下：

  public static RegionLocations getRegionLocations(final Result r) {
    if (r == null) return null;
    RegionInfo regionInfo = getRegionInfo(r, getRegionInfoColumn());
    if (regionInfo == null) return null;

    List<HRegionLocation> locations = new ArrayList<>(1);
    NavigableMap<byte[],NavigableMap<byte[],byte[]>> familyMap = r.getNoVersionMap();

    locations.add(getRegionLocation(r, regionInfo, 0));

    NavigableMap<byte[], byte[]> infoMap = familyMap.get(getCatalogFamily());
    if (infoMap == null) return new RegionLocations(locations);

    // iterate until all serverName columns are seen
    int replicaId = 0;
    byte[] serverColumn = getServerColumn(replicaId);
    SortedMap<byte[], byte[]> serverMap;
    serverMap = infoMap.tailMap(serverColumn, false);

    if (serverMap.isEmpty()) return new RegionLocations(locations);

    for (Map.Entry<byte[], byte[]> entry : serverMap.entrySet()) {
      replicaId = parseReplicaIdFromServerColumn(entry.getKey());
      if (replicaId < 0) {
        break;
      }
      HRegionLocation location = getRegionLocation(r, regionInfo, replicaId);
      // In case the region replica is newly created, it's location might be null. We usually do not
      // have HRL's in RegionLocations object with null ServerName. They are handled as null HRLs.
      if (location.getServerName() == null) {
        locations.add(null);
      } else {
        locations.add(location);
      }
    }

    return new RegionLocations(locations);
  }

重要的一点，从Result中获取Region信息HRegionInfo，getRegionInfoColumn()返回的为字符串"regioninfo"对应的byte[]，也就是meta表中对应的qualifier，而family为"info"，getHRegionInfo()和getRegionInfoColumn()方法如下：

protected static byte[] getRegionInfoColumn() {
    return HConstants.REGIONINFO_QUALIFIER;}

private static HRegionInfo getHRegionInfo(final Result r, byte [] qualifier) {
    
	// 获取单元格Cell，family为"info"，qualifier为"regioninfo"
	Cell cell = r.getColumnLatestCell(getFamily(), qualifier);
    if (cell == null) return null;
    
    // 调用HRegionInfo的parseFromOrNull()方法将Cell转换为HRegionInfo，
    // 实际上就是反序列化，读出HRegionInfo需要的成员变量，比如startKey、endKey、regionId、regionName、split、offLine等
    return HRegionInfo.parseFromOrNull(cell.getValueArray(),
      cell.getValueOffset(), cell.getValueLength());

有两步骤：
1、获取单元格Cell，family为"info"，qualifier为"regioninfo"；

2、调用HRegionInfo的parseFromOrNull()方法将Cell转换为HRegionInfo，实际上就是反序列化，读出HRegionInfo需要的成员变量，比如startKey、endKey、regionId、regionName、split、offLine等。

四、当前线程休眠一段时间，再次重试，休眠的时间与pause和tries有关，越往后，停顿时间一般越长（波动时间除外）

最后，我们再看下当前线程休眠一段时间，再次重试，休眠的时间与pause和tries有关，越往后，停顿时间一般越长（波动时间除外）相关内容，代码如下：

public static long getPauseTime(final long pause, final int tries) {
    int ntries = tries;
    if (ntries >= HConstants.RETRY_BACKOFF.length) {
      ntries = HConstants.RETRY_BACKOFF.length - 1;
    }
    if (ntries < 0) {
      ntries = 0;
    }

    long normalPause = pause * HConstants.RETRY_BACKOFF[ntries];
    // 1% possible jitter
    long jitter = (long) (normalPause * ThreadLocalRandom.current().nextFloat() * 0.01f);
    return normalPause + jitter;
  }

基本上是越往后，休眠的时间越长，而pause是取参数hbase.client.pause，参数未配置的话，默认为100。
以上就是关于非Meta表，也就是业务表中row相关Region定位，实际上它还是要从Meta表中去查找的，Meta表的名字为hbase:meta，family为"info"，而qualifier为"regioninfo"，它也是HBase的一张表，如果从其中寻找数据的话，也是需要进行Region定位的，如果是meta表，直接调用locateMeta()方法进行定位，再来看下locateMeta()方法吧，代码如下：

  private RegionLocations locateMeta(final TableName tableName,
      boolean useCache, int replicaId) throws IOException {
    // HBASE-10785: We cache the location of the META itself, so that we are not overloading
    // zookeeper with one request for every region lookup. We cache the META with empty row
    // key in MetaCache.
    byte[] metaCacheKey = HConstants.EMPTY_START_ROW; // use byte[0] as the row for meta
    RegionLocations locations = null;
    if (useCache) {
      locations = getCachedLocation(tableName, metaCacheKey);
      if (locations != null && locations.getRegionLocation(replicaId) != null) {
        return locations;
      }
    }

    // only one thread should do the lookup.
    synchronized (metaRegionLock) {
      // Check the cache again for a hit in case some other thread made the
      // same query while we were waiting on the lock.
      if (useCache) {
        locations = getCachedLocation(tableName, metaCacheKey);
        if (locations != null && locations.getRegionLocation(replicaId) != null) {
          return locations;
        }
      }

      // Look up from zookeeper
      locations = get(this.registry.getMetaRegionLocations());
      if (locations != null) {
        cacheLocation(tableName, locations);
      }
    }
    return locations;
  }

Meta表中Region的定位与非Meta表有很大不同，具体流程如下：
1、获得meta缓存的key，实际上为byte [0]；

2、如果使用缓存的话，调用getCachedLocation()方法，定位Region位置，获得RegionLocations，即locations，如果locations不为空的话，说明缓存中存在对应数据，直接返回，否则继续往下执行，以定位Region位置；

3、使用synchronized关键字在metaRegionLock上加互斥锁，确保某一时刻只有一个线程在执行：

3.1、再次检查缓存，因为可能在当前线程等待对象metaRegionLock上互斥锁的时候，一些其它线程做相同的查询，已经将对应数据加载入缓存；

3.2、从zookeeper中寻找Meta数据；

3.3、定位到Region后，调用cacheLocation()方法放入缓存中，以备后续访问者可以直接从缓存中读取。

那Meta数据是如何从ZooKeeper中获取的，它是通过成员变量registry的getMetaRegionLocation()方法获取的，这个registry的初始化在HConnectionImplementation构造方法中如下：

   this.registry = setupRegistry();

再看下这个setupRegistry()方法，代码如下：

private Registry setupRegistry() throws IOException {
      return RegistryFactory.getRegistry(this);}

它调用的是RegistryFactory工厂类的静态方法getRegistry()来获得Registry实例的，继续往下看：

static Registry getRegistry(final Connection connection)
  throws IOException {
	  
	// 获取类名registryClass，取参数hbase.client.registry.impl，参数未配置的话默认为ZooKeeperRegistry
    String registryClass = connection.getConfiguration().get("hbase.client.registry.impl",
      ZooKeeperRegistry.class.getName());
    Registry registry = null;
    try {
    	
      // 通过反射获得registryClass的实例registry
      registry = (Registry)Class.forName(registryClass).newInstance();
    } catch (Throwable t) {
      throw new IOException(t);
    }
    
    // 调用init()方法初始化registry
    registry.init(connection);
    
    // 返回registry
    return registry;
  }

首先获取类名registryClass，取参数hbase.client.registry.impl，参数未配置的话默认为ZooKeeperRegistry，接着通过反射获得registryClass的实例registry，然后调用init()方法初始化registry，最后返回registry。而返回前的初始化操作也比较简单，如下：

  @Override
  public void init(Connection connection) {
    if (!(connection instanceof ConnectionManager.HConnectionImplementation)) {
      throw new RuntimeException("This registry depends on HConnectionImplementation");
    }
    this.hci = (ConnectionManager.HConnectionImplementation)connection;
  }

先做connection的判断，看它是否是ConnectionManager.HConnectionImplementation实例，然后将其转化为ConnectionManager.HConnectionImplementation，并赋值给ZooKeeperRegistry的成员变量hci。
知道了registry是ZooKeeperRegistry的实例，就看下ZooKeeperRegistry的getMetaRegionLocation()方法，代码如下：

  @Override
  public RegionLocations getMetaRegionLocation() throws IOException {
    
	// 从hci中获取ZooKeeper连接ZooKeeperKeepAliveConnection，即zkw
	ZooKeeperKeepAliveConnection zkw = hci.getKeepAliveZooKeeperWatcher();
 
    try {
      if (LOG.isTraceEnabled()) {
        LOG.trace("Looking up meta region location in ZK," + " connection=" + this);
      }
      
      // 获取ServerName:servername，通过MetaTableLocator实例的blockUntilAvailable()方法获取的
      ServerName servername = new MetaTableLocator().blockUntilAvailable(zkw, hci.rpcTimeout);
      if (LOG.isTraceEnabled()) {
        LOG.trace("Looked up meta region location, connection=" + this +
          "; serverName=" + ((servername == null) ? "null" : servername));
      }
      
      // servername为空的话，直接返回null
      if (servername == null) return null;
      
      // 构造HRegionLocation实例loc，
      // 需要的参数包括：HRegionInfo.FIRST_META_REGIONINFO、上面获得的servername和默认为0的seqNum，
      // HRegionInfo的FIRST_META_REGIONINFO实际上就是HRegionInfo的一个实例，其regionId为1L，TableName为TableName.META_TABLE_NAME
      HRegionLocation loc = new HRegionLocation(HRegionInfo.FIRST_META_REGIONINFO, servername, 0);
      
      // 利用loc构造RegionLocations，实际上RegionLocations中只包含这一个HRegionLocation
      return new RegionLocations(new HRegionLocation[] {loc});
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      return null;
    } finally {
    	
      // 关闭zkw
      zkw.close();
    }
  }

getMetaRegionLocation()方法处理流程如下：
1、先从hci中获取ZooKeeper连接ZooKeeperKeepAliveConnection，即zkw；

2、获取ServerName:servername，通过MetaTableLocator实例的blockUntilAvailable()方法获取的；

3、servername为空的话，直接返回null；

4、构造HRegionLocation实例loc，需要的参数包括：HRegionInfo.FIRST_META_REGIONINFO、上面获得的servername和默认为0的seqNum，HRegionInfo的FIRST_META_REGIONINFO实际上就是HRegionInfo的一个实例，其regionId为1L，TableName为TableName.META_TABLE_NAME；

5、利用loc构造RegionLocations，实际上RegionLocations中只包含这一个HRegionLocation，并返回。

有了ServerName、HRegionInfo，那么HRegionLocation就很容易获得了。那么ServerName是如何获取的呢？跟踪MetaTableLocator的blockUntilAvailable()方法，其中的关键代码为：

sn = getMetaRegionLocation(zkw);
        if (sn != null || sw.elapsedMillis()
            > timeout - HConstants.SOCKET_RETRY_WAIT_MS) {
          break;
        }

而getMetaRegionLocation()方法如下：

@Nullable
  public ServerName getMetaRegionLocation(final ZooKeeperWatcher zkw) {
    try {
      RegionState state = getMetaRegionState(zkw);
      return state.isOpened() ? state.getServerName() : null;
    } catch (KeeperException ke) {
      return null;
    }

而getMetaRegionState()方法关键代码如下：

byte[] data = ZKUtil.getData(zkw, zkw.metaServerZNode);

它利用ZKUtil获取ZooKeeper上的metaServerZNode，而metaServerZNode的初始化如下：

  metaServerZNode = ZKUtil.joinZNode(baseZNode,
        conf.get("zookeeper.znode.metaserver", "meta-region-server"));

baseZNode取参数zookeeper.znode.parent，参数未配置则默认为/hbase，然后再取参数zookeeper.znode.metaserver，参数未配置则默认为meta-region-server。也就是说，默认情况下，metaserver在ZooKeeper上的位置为/hbase/meta-region-server。

总结

HBase是一个分布式数据库，对于数据的读写访问最终都是通过RowKey进行的，而要想获取数据，就必须通过指定TableName、Row来定位Region，而Region的定位，分为两种情况，一个是非Meta即用户表的定位，一个是Meta表的定位，而非Meta实际上是创建了一个ClientSmallReversedScanner从Meta表中查询的，Meta表在HBase中表名为hbase:meta，对应的family为info，qualifier为regioninfo，而问题最终归结到Meta表Region位置的定位，它是通过在ZooKeeper寻找/hbase/meta-region-server/路径下的信息来定位Meta的Server，然后通过构造一个regionId为1L，TableName为TableName.META_TABLE_NAME的HRegionInfo实例来确定Region位置信息RegionLocations的。

珍珠没有奶茶^_^

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hbase源码分析（七）Region定位（下）2021SC@SDUSC

文章目录前言介绍总结前言上篇文章讲述了 Region定位的过程，本篇继续，针对其中的几点，做一些细节方面的介绍。介绍一.从缓存中看看是否我们已经有该Region的getCachedLocation()方法首先我们看下从缓存中看看是否我们已经有该Region的getCachedLocation()方法，代码如下：/** * Search the cache for a location that fits our table and row key. * Return nul..
复制链接

扫一扫