白话HBase-RegionServer如何判断Region本地化百分比的

说明

本地化百分比=这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小

 

入手

已知HRegionServer中心跳汇报给HMaster的信息中,有数据本地化百分比的指标RegionLoad中,我们需要看看RegionLoad这个数据是怎么产生&计算的

开始源码

1.HRegionServer这个方法是定时被调用构造最新RegionLoad的方法

–>这个RegionLoad内容会被心跳发送给HMaster

1

2

private RegionLoad createRegionLoad(final HRegion r, RegionLoad.Builder regionLoadBldr,

    RegionSpecifier.Builder regionSpecifier)

2.其中dataLocality变量是region的本地化百分比值

float dataLocality =
    r.getHDFSBlocksDistribution().getBlockLocalityIndex(serverName.getHostname());

那么下面就看看

getHDFSBlocksDistribution  返回  HDFSBlocksDistribution类型

getBlockLocalityIndex

3.HDFSBlocksDistribution是啥

两个成员变量

private Map<String,HostAndWeight> hostAndWeights = null;  key:hostname  value:HostAndWeight(private String host;private long weight;)//hostname对应的文件大小
private long uniqueBlocksTotalWeight = 0;

 

getHDFSBlocksDistribution方法

遍历region下面所有StoreFile  得到每个storefile的getHDFSBlockDistribution放到返回的HDFSBlocksDistribution中

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

/**

 * This function will return the HDFS blocks distribution based on the data

 * captured when HFile is created

 * @return The HDFS blocks distribution for the region.

 */

public HDFSBlocksDistribution getHDFSBlocksDistribution() {

  HDFSBlocksDistribution hdfsBlocksDistribution =

    new HDFSBlocksDistribution();

  synchronized (this.stores) {

   //遍历所有StoreFile

    for (Store store : this.stores.values()) {

      for (StoreFile sf : store.getStorefiles()) {

        HDFSBlocksDistribution storeFileBlocksDistribution =

          sf.getHDFSBlockDistribution();

        hdfsBlocksDistribution.add(storeFileBlocksDistribution);

      }

    }

  }

  return hdfsBlocksDistribution;

}

4.StoreFile 的getHDFSBlockDistribution方法(HRegionServer–>HRegion-->StoreFile)

1

2

3

4

5

6

7

8

9

//StoreFile 的getHDFSBlockDistribution方法

/**

 *

 * @return the cached value of HDFS blocks distribution. The cached value is

 * calculated when store file is opened.

 */

public HDFSBlocksDistribution getHDFSBlockDistribution() {

  return this.fileInfo.getHDFSBlockDistribution();

}

5.实际使用的是(StoreFileInfo)fileInfo.getHDFSBlockDistribution的信息

1

2

3

4

/** @return the HDFS block distribution */

public HDFSBlocksDistribution getHDFSBlockDistribution() {

  return this.hdfsBlocksDistribution;

}

思考:

fileInfo的hdfsBlocksDistribution在哪里赋值的?

IDE里面在对应变量上面右键选择Find Usages(看哪里被用到了)

Value Read是读取不用看了

看Value Write 在哪里被赋值了,原来是StoreFileInfo的open方法!

1

2

3

4

5

6

if (this.reference != null) {      //reference是split产生的,hfile-link是snapshot产生的

  hdfsBlocksDistribution = computeRefFileHDFSBlockDistribution(fs, reference, status);

else {

//实际数据走这里

  hdfsBlocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, length);

}

6.hdfsBlocksDistribution=FSUtils.computeHDFSBlocksDistribution

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

/**

 * Compute HDFS blocks distribution of a given file, or a portion of the file

 * @param fs file system

 * @param status file status of the file

 * @param start start position of the portion

 * @param length length of the portion

 * @return The HDFS blocks distribution

 */

static public HDFSBlocksDistribution computeHDFSBlocksDistribution(

  final FileSystem fs, FileStatus status, long start, long length)

  throws IOException {

  HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution();

 

//fs.getFileBlockLocations是HDFS的api,获取文件对应的block块位置信息

  BlockLocation [] blockLocations =

    fs.getFileBlockLocations(status, start, length);

 

//遍历文件下所有block,获取block对应的N个副本位置——hosts数组

  for(BlockLocation bl : blockLocations) {

    String [] hosts = bl.getHosts();

    long len = bl.getLength();

//addHostsAndBlockWeight方法中将副本位置和数据大小放到unique weight和hostAndWeight中

    blocksDistribution.addHostsAndBlockWeight(hosts, len);

  }

 

  return blocksDistribution;

}

7.addHostsAndBlockWeight

上面的blocksDistribution.addHostsAndBlockWeight(hosts, len);具体实现

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

/**

 * add some weight to a list of hosts, update the value of unique block weight

 * @param hosts the list of the host

 * @param weight the weight

 */

public void addHostsAndBlockWeight(String[] hosts, long weight) {

  if (hosts == null || hosts.length == 0) {

    // erroneous data

    return;

  }

 

  addUniqueWeight(weight);

  for (String hostname : hosts) {

    addHostAndBlockWeight(hostname, weight);

  }

}

说明

uniqueBlocksTotalWeight是文件的逻辑大小
Map<String,HostAndWeight> hostAndWeights 是每个host上这个region对应的文件逻辑大小

(代码中其实遍历了三副本的hosts,但是因为这个Map的key是host,所以hostAndWeights中每个hostname下面某一文件只会有一份block(HDFS的三副本策略))

 


8.最后的getBlockLocalityIndex

截止到这里,只看了RS中调用的r.getHDFSBlocksDistribution().getBlockLocalityIndex  前半部分r.getHDFSBlocksDistribution

后面的getBlockLocalityIndex(serverName.getHostname());做了啥计算?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

/**

 * return the locality index of a given host

 * @param host the host name

 * @return the locality index of the given host

 */

public float getBlockLocalityIndex(String host) {

  float localityIndex = 0;

//获取这个region当前这个Server上面的HostAndWeight

  HostAndWeight hostAndWeight = this.hostAndWeights.get(host);

  if (hostAndWeight != null && uniqueBlocksTotalWeight != 0) {

//用这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小

    localityIndex=(float)hostAndWeight.weight/(float)uniqueBlocksTotalWeight;

  }

  return localityIndex;

}

 

 

 

拓展

HDFS对HBase的影响至关重要,有精力可以多关注HDFS的api和特性

HDFS的BlockLocation都存了啥

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

/**

 * Represents the network location of a block, information about the hosts

 * that contain block replicas, and other block metadata (E.g. the file

 * offset associated with the block, length, whether it is corrupt, etc).

 */

@InterfaceAudience.Public

@InterfaceStability.Stable

public class BlockLocation {

  private String[] hosts; // Datanode hostnames

  private String[] cachedHosts; // Datanode hostnames with a cached replica

  private String[] names; // Datanode IP:xferPort for accessing the block

  private String[] topologyPaths; // Full path name in network topology

  private long offset;  // Offset of the block in the file

  private long length;

  private boolean corrupt;

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值