Hbase源码分析(七)Region定位(下)2021SC@SDUSC

文章目录


前言

上篇文章讲述了 Region定位的过程,本篇继续,针对其中的几点,做一些细节方面的介绍。


介绍

一.从缓存中看看是否我们已经有该Region的getCachedLocation()方法
首先我们看下从缓存中看看是否我们已经有该Region的getCachedLocation()方法,代码如下:

/**
   * Search the cache for a location that fits our table and row key.
   * Return null if no suitable region is located.
   * @return Null or region location found in cache.
   */
  RegionLocations getCachedLocation(final TableName tableName,
      final byte [] row) {
    return metaCache.getCachedLocation(tableName, row);
  }
      

它实际上是从metaCache中获取tableName和row所对应的Region位置信息。而metaCache是一个叫做MetaCache类的对象,它是为缓存Region位置信息,即Meta Data而专门设计的一个数据结构,我们先看下它的getCachedLocation()方法:

/**
   * Search the cache for a location that fits our table and row key.
   * Return null if no suitable region is located.
   *
   * @return Null or region location found in cache.
   */
  public RegionLocations getCachedLocation(final TableName tableName, final byte [] row) {
    ConcurrentNavigableMap<byte[], RegionLocations> tableLocations =
      getTableLocations(tableName);

    Entry<byte[], RegionLocations> e = tableLocations.floorEntry(row);
    if (e == null) {
      if (metrics != null) metrics.incrMetaCacheMiss();
      return null;
    }
    RegionLocations possibleRegion = e.getValue();

    // make sure that the end key is greater than the row we're looking
    // for, otherwise the row actually belongs in the next region, not
    // this one. the exception case is when the endkey is
    // HConstants.EMPTY_END_ROW, signifying that the region we're
    // checking is actually the last region in the table.
    byte[] endKey = possibleRegion.getRegionLocation().getRegion().getEndKey();
    // Here we do direct Bytes.compareTo and not doing CellComparator/MetaCellComparator path.
    // MetaCellComparator is for comparing against data in META table which need special handling.
    // Not doing that is ok for this case because
    // 1. We are getting the Region location for the given row in non META tables only. The compare
    // checks the given row is within the end key of the found region. So META regions are not
    // coming in here.
    // 2. Even if META region comes in, its end key will be empty byte[] and so Bytes.equals(endKey,
    // HConstants.EMPTY_END_ROW) check itself will pass.
    if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) ||
        Bytes.compareTo(endKey, 0, endKey.length, row, 0, row.length) > 0) {
      if (metrics != null) metrics.incrMetaCacheHit();
      return possibleRegion;
    }

    // Passed all the way through, so we got nothing - complete cache miss
    if (metrics != null) metrics.incrMetaCacheMiss();
    return null;
  }

这里要着重说下ConcurrentSkipListMap,它提供了一种线程安全的并发访问的排序映射表。内部是SkipList(跳表)结构实现,在理论上能够在O(log(n))时间内完成查找、插入、删除操作。而根据tableName获取缓存的该表相关的位置信息tableLocations的getTableLocations()方法,如下:

private ConcurrentSkipListMap<byte[], RegionLocations>
    getTableLocations(final TableName tableName) {
    // find the map of cached locations for this table
    ConcurrentSkipListMap<byte[], RegionLocations> result;
    
    // 从cachedRegionLocations中根据tableName获取缓存的该表相关的位置信息tableLocations,即一个保存的row到RegionLocations映射的跳表结构ConcurrentSkipListMap
    // cachedRegionLocations也是一个ConcurrentHashMap,它是MetaCache中实现缓存Region位置信息功能所依靠的最主要的数据结构,
    // 它存储的是{tableName->[row->RegionLocations]}的两级映射关系
    result = this.cachedRegionLocations.get(tableName);
    // if tableLocations for this table isn't built yet, make one
    if (result == null) {// 如果result没有创建的话,创建一个
      result = new ConcurrentSkipListMap<byte[], RegionLocations>(Bytes.BYTES_COMPARATOR);
      
      // 将创建的result放入cachedRegionLocations,并获取旧值old
      ConcurrentSkipListMap<byte[], RegionLocations> old =
          this.cachedRegionLocations.putIfAbsent(tableName, result);
      
      // 如果old不为空,直接返回old
      if (old != null) {
        return old;
      }
    }
    
    // 返回result
    return result;}

这里要强调的是MetaCache的一个成员变量:cachedRegionLocations,它的定义如下:

  /**
   * Map of table to table {@link HRegionLocation}s.
   */
  private final ConcurrentMap<TableName, ConcurrentSkipListMap<byte[], RegionLocations>>
  cachedRegionLocations =
  new ConcurrentHashMap<TableName, ConcurrentSkipListMap<byte[], RegionLocations>>();

它是MetaCache中实现缓存Region位置信息功能所依靠的最主要的数据结构,它存储的是{tableName->[row->RegionLocations]}的两级映射关系。而MetaCache中还有一个涉及到所有Server的变量,如下:

 // The presence of a server in the map implies it's likely that there is an
  // entry in cachedRegionLocations that map to this server; but the absence
  // of a server in this map guarentees that there is no entry in cache that
  // maps to the absent server.
  // The access to this attribute must be protected by a lock on cachedRegionLocations
  private final Set<ServerName> cachedServers = new ConcurrentSkipListSet<ServerName>();

二、缓存获得的位置信息locations的cacheLocation()方法

下面我们看下缓存获得的位置信息locations的cacheLocation()方法,代码如下:

public void cacheLocation(final TableName tableName, final RegionLocations locations) {
    
	// 从Region位置信息locations中获取Region对应的起始rowkey:startKey
	byte [] startKey = locations.getRegionLocation().getRegionInfo().getStartKey();
    
	// 调用getTableLocations()方法,根据表名tableName获取表的位置信息tableLocations
	// 它是一个Region的起始rowkey,即startKey到RegionLocations的映射
	ConcurrentMap<byte[], RegionLocations> tableLocations = getTableLocations(tableName);
    
	// 将新得到的Region位置信息locations放入tableLocations,并且得到之前的Region位置信息oldLocation
	RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, locations);
    
	// 根据oldLocation是否为null判断是否为新缓存的一个条目
	boolean isNewCacheEntry = (oldLocation == null);
    if (isNewCacheEntry) {// 如果是新缓存的一个条目
      if (LOG.isTraceEnabled()) {
        LOG.trace("Cached location: " + locations);
      }
      
      // 调用addToCachedServers()方法,缓存出现的server,加入到cachedServers列表中
      addToCachedServers(locations);
      
      // 返回
      return;
    }

三、将Result转换为我们需要的RegionLocations,即regionInfoRow->locations

再看下将Result转换为需要的RegionLocations,即regionInfoRow->locations是如何处理的。它调用的是MetaTableAccessor的getRegionLocations()方法,代码如下:

  public static RegionLocations getRegionLocations(final Result r) {
    if (r == null) return null;
    RegionInfo regionInfo = getRegionInfo(r, getRegionInfoColumn());
    if (regionInfo == null) return null;

    List<HRegionLocation> locations = new ArrayList<>(1);
    NavigableMap<byte[],NavigableMap<byte[],byte[]>> familyMap = r.getNoVersionMap();

    locations.add(getRegionLocation(r, regionInfo, 0));

    NavigableMap<byte[], byte[]> infoMap = familyMap.get(getCatalogFamily());
    if (infoMap == null) return new RegionLocations(locations);

    // iterate until all serverName columns are seen
    int replicaId = 0;
    byte[] serverColumn = getServerColumn(replicaId);
    SortedMap<byte[], byte[]> serverMap;
    serverMap = infoMap.tailMap(serverColumn, false);

    if (serverMap.isEmpty()) return new RegionLocations(locations);

    for (Map.Entry<byte[], byte[]> entry : serverMap.entrySet()) {
      replicaId = parseReplicaIdFromServerColumn(entry.getKey());
      if (replicaId < 0) {
        break;
      }
      HRegionLocation location = getRegionLocation(r, regionInfo, replicaId);
      // In case the region replica is newly created, it's location might be null. We usually do not
      // have HRL's in RegionLocations object with null ServerName. They are handled as null HRLs.
      if (location.getServerName() == null) {
        locations.add(null);
      } else {
        locations.add(location);
      }
    }

    return new RegionLocations(locations);
  }

重要的一点,从Result中获取Region信息HRegionInfo,getRegionInfoColumn()返回的为字符串"regioninfo"对应的byte[],也就是meta表中对应的qualifier,而family为"info",getHRegionInfo()和getRegionInfoColumn()方法如下:

protected static byte[] getRegionInfoColumn() {
    return HConstants.REGIONINFO_QUALIFIER;}
private static HRegionInfo getHRegionInfo(final Result r, byte [] qualifier) {
    
	// 获取单元格Cell,family为"info",qualifier为"regioninfo"
	Cell cell = r.getColumnLatestCell(getFamily(), qualifier);
    if (cell == null) return null;
    
    // 调用HRegionInfo的parseFromOrNull()方法将Cell转换为HRegionInfo,
    // 实际上就是反序列化,读出HRegionInfo需要的成员变量,比如startKey、endKey、regionId、regionName、split、offLine等
    return HRegionInfo.parseFromOrNull(cell.getValueArray(),
      cell.getValueOffset(), cell.getValueLength());

有两步骤:
1、获取单元格Cell,family为"info",qualifier为"regioninfo";

2、调用HRegionInfo的parseFromOrNull()方法将Cell转换为HRegionInfo,实际上就是反序列化,读出HRegionInfo需要的成员变量,比如startKey、endKey、regionId、regionName、split、offLine等。

四、当前线程休眠一段时间,再次重试,休眠的时间与pause和tries有关,越往后,停顿时间一般越长(波动时间除外)

最后,我们再看下当前线程休眠一段时间,再次重试,休眠的时间与pause和tries有关,越往后,停顿时间一般越长(波动时间除外)相关内容,代码如下:

public static long getPauseTime(final long pause, final int tries) {
    int ntries = tries;
    if (ntries >= HConstants.RETRY_BACKOFF.length) {
      ntries = HConstants.RETRY_BACKOFF.length - 1;
    }
    if (ntries < 0) {
      ntries = 0;
    }

    long normalPause = pause * HConstants.RETRY_BACKOFF[ntries];
    // 1% possible jitter
    long jitter = (long) (normalPause * ThreadLocalRandom.current().nextFloat() * 0.01f);
    return normalPause + jitter;
  }

基本上是越往后,休眠的时间越长,而pause是取参数hbase.client.pause,参数未配置的话,默认为100。
以上就是关于非Meta表,也就是业务表中row相关Region定位,实际上它还是要从Meta表中去查找的,Meta表的名字为hbase:meta,family为"info",而qualifier为"regioninfo",它也是HBase的一张表,如果从其中寻找数据的话,也是需要进行Region定位的,如果是meta表,直接调用locateMeta()方法进行定位,再来看下locateMeta()方法吧,代码如下:

  private RegionLocations locateMeta(final TableName tableName,
      boolean useCache, int replicaId) throws IOException {
    // HBASE-10785: We cache the location of the META itself, so that we are not overloading
    // zookeeper with one request for every region lookup. We cache the META with empty row
    // key in MetaCache.
    byte[] metaCacheKey = HConstants.EMPTY_START_ROW; // use byte[0] as the row for meta
    RegionLocations locations = null;
    if (useCache) {
      locations = getCachedLocation(tableName, metaCacheKey);
      if (locations != null && locations.getRegionLocation(replicaId) != null) {
        return locations;
      }
    }

    // only one thread should do the lookup.
    synchronized (metaRegionLock) {
      // Check the cache again for a hit in case some other thread made the
      // same query while we were waiting on the lock.
      if (useCache) {
        locations = getCachedLocation(tableName, metaCacheKey);
        if (locations != null && locations.getRegionLocation(replicaId) != null) {
          return locations;
        }
      }

      // Look up from zookeeper
      locations = get(this.registry.getMetaRegionLocations());
      if (locations != null) {
        cacheLocation(tableName, locations);
      }
    }
    return locations;
  }

Meta表中Region的定位与非Meta表有很大不同,具体流程如下:
1、获得meta缓存的key,实际上为byte [0];

2、如果使用缓存的话,调用getCachedLocation()方法,定位Region位置,获得RegionLocations,即locations,如果locations不为空的话,说明缓存中存在对应数据,直接返回,否则继续往下执行,以定位Region位置;

3、使用synchronized关键字在metaRegionLock上加互斥锁,确保某一时刻只有一个线程在执行:

3.1、再次检查缓存,因为可能在当前线程等待对象metaRegionLock上互斥锁的时候,一些其它线程做相同的查询 ,已经将对应数据加载入缓存;

3.2、从zookeeper中寻找Meta数据;

3.3、定位到Region后,调用cacheLocation()方法放入缓存中,以备后续访问者可以直接从缓存中读取。

那Meta数据是如何从ZooKeeper中获取的,它是通过成员变量registry的getMetaRegionLocation()方法获取的,这个registry的初始化在HConnectionImplementation构造方法中如下:

   this.registry = setupRegistry();

再看下这个setupRegistry()方法,代码如下:

private Registry setupRegistry() throws IOException {
      return RegistryFactory.getRegistry(this);}

它调用的是RegistryFactory工厂类的静态方法getRegistry()来获得Registry实例的,继续往下看:

static Registry getRegistry(final Connection connection)
  throws IOException {
	  
	// 获取类名registryClass,取参数hbase.client.registry.impl,参数未配置的话默认为ZooKeeperRegistry
    String registryClass = connection.getConfiguration().get("hbase.client.registry.impl",
      ZooKeeperRegistry.class.getName());
    Registry registry = null;
    try {
    	
      // 通过反射获得registryClass的实例registry
      registry = (Registry)Class.forName(registryClass).newInstance();
    } catch (Throwable t) {
      throw new IOException(t);
    }
    
    // 调用init()方法初始化registry
    registry.init(connection);
    
    // 返回registry
    return registry;
  }

首先获取类名registryClass,取参数hbase.client.registry.impl,参数未配置的话默认为ZooKeeperRegistry,接着通过反射获得registryClass的实例registry,然后调用init()方法初始化registry,最后返回registry。而返回前的初始化操作也比较简单,如下:

  @Override
  public void init(Connection connection) {
    if (!(connection instanceof ConnectionManager.HConnectionImplementation)) {
      throw new RuntimeException("This registry depends on HConnectionImplementation");
    }
    this.hci = (ConnectionManager.HConnectionImplementation)connection;
  }

先做connection的判断,看它是否是ConnectionManager.HConnectionImplementation实例,然后将其转化为ConnectionManager.HConnectionImplementation,并赋值给ZooKeeperRegistry的成员变量hci。
知道了registry是ZooKeeperRegistry的实例,就看下ZooKeeperRegistry的getMetaRegionLocation()方法,代码如下:

  @Override
  public RegionLocations getMetaRegionLocation() throws IOException {
    
	// 从hci中获取ZooKeeper连接ZooKeeperKeepAliveConnection,即zkw
	ZooKeeperKeepAliveConnection zkw = hci.getKeepAliveZooKeeperWatcher();
 
    try {
      if (LOG.isTraceEnabled()) {
        LOG.trace("Looking up meta region location in ZK," + " connection=" + this);
      }
      
      // 获取ServerName:servername,通过MetaTableLocator实例的blockUntilAvailable()方法获取的
      ServerName servername = new MetaTableLocator().blockUntilAvailable(zkw, hci.rpcTimeout);
      if (LOG.isTraceEnabled()) {
        LOG.trace("Looked up meta region location, connection=" + this +
          "; serverName=" + ((servername == null) ? "null" : servername));
      }
      
      // servername为空的话,直接返回null
      if (servername == null) return null;
      
      // 构造HRegionLocation实例loc,
      // 需要的参数包括:HRegionInfo.FIRST_META_REGIONINFO、上面获得的servername和默认为0的seqNum,
      // HRegionInfo的FIRST_META_REGIONINFO实际上就是HRegionInfo的一个实例,其regionId为1L,TableName为TableName.META_TABLE_NAME
      HRegionLocation loc = new HRegionLocation(HRegionInfo.FIRST_META_REGIONINFO, servername, 0);
      
      // 利用loc构造RegionLocations,实际上RegionLocations中只包含这一个HRegionLocation
      return new RegionLocations(new HRegionLocation[] {loc});
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      return null;
    } finally {
    	
      // 关闭zkw
      zkw.close();
    }
  }

getMetaRegionLocation()方法处理流程如下:
1、先从hci中获取ZooKeeper连接ZooKeeperKeepAliveConnection,即zkw;

2、获取ServerName:servername,通过MetaTableLocator实例的blockUntilAvailable()方法获取的;

3、servername为空的话,直接返回null;

4、构造HRegionLocation实例loc,需要的参数包括:HRegionInfo.FIRST_META_REGIONINFO、上面获得的servername和默认为0的seqNum,HRegionInfo的FIRST_META_REGIONINFO实际上就是HRegionInfo的一个实例,其regionId为1L,TableName为TableName.META_TABLE_NAME;

5、利用loc构造RegionLocations,实际上RegionLocations中只包含这一个HRegionLocation,并返回。

有了ServerName、HRegionInfo,那么HRegionLocation就很容易获得了。那么ServerName是如何获取的呢?跟踪MetaTableLocator的blockUntilAvailable()方法,其中的关键代码为:

sn = getMetaRegionLocation(zkw);
        if (sn != null || sw.elapsedMillis()
            > timeout - HConstants.SOCKET_RETRY_WAIT_MS) {
          break;
        }

而getMetaRegionLocation()方法如下:

@Nullable
  public ServerName getMetaRegionLocation(final ZooKeeperWatcher zkw) {
    try {
      RegionState state = getMetaRegionState(zkw);
      return state.isOpened() ? state.getServerName() : null;
    } catch (KeeperException ke) {
      return null;
    }

而getMetaRegionState()方法关键代码如下:

byte[] data = ZKUtil.getData(zkw, zkw.metaServerZNode);

它利用ZKUtil获取ZooKeeper上的metaServerZNode,而metaServerZNode的初始化如下:

  metaServerZNode = ZKUtil.joinZNode(baseZNode,
        conf.get("zookeeper.znode.metaserver", "meta-region-server"));

baseZNode取参数zookeeper.znode.parent,参数未配置则默认为/hbase,然后再取参数zookeeper.znode.metaserver,参数未配置则默认为meta-region-server。也就是说,默认情况下,metaserver在ZooKeeper上的位置为/hbase/meta-region-server。

总结

HBase是一个分布式数据库,对于数据的读写访问最终都是通过RowKey进行的,而要想获取数据,就必须通过指定TableName、Row来定位Region,而Region的定位,分为两种情况,一个是非Meta即用户表的定位,一个是Meta表的定位,而非Meta实际上是创建了一个ClientSmallReversedScanner从Meta表中查询的,Meta表在HBase中表名为hbase:meta,对应的family为info,qualifier为regioninfo,而问题最终归结到Meta表Region位置的定位,它是通过在ZooKeeper寻找/hbase/meta-region-server/路径下的信息来定位Meta的Server,然后通过构造一个regionId为1L,TableName为TableName.META_TABLE_NAME的HRegionInfo实例来确定Region位置信息RegionLocations的。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值