说明
本地化百分比=这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小
入手
已知HRegionServer中心跳汇报给HMaster的信息中,有数据本地化百分比的指标RegionLoad中,我们需要看看RegionLoad这个数据是怎么产生&计算的
开始源码
1.HRegionServer这个方法是定时被调用构造最新RegionLoad的方法
–>这个RegionLoad内容会被心跳发送给HMaster
1 2 | private RegionLoad createRegionLoad( final HRegion r, RegionLoad.Builder regionLoadBldr, RegionSpecifier.Builder regionSpecifier) |
2.其中dataLocality变量是region的本地化百分比值
float dataLocality =
r.getHDFSBlocksDistribution().getBlockLocalityIndex(serverName.getHostname());
那么下面就看看
getHDFSBlocksDistribution 返回 HDFSBlocksDistribution类型
和
getBlockLocalityIndex
3.HDFSBlocksDistribution是啥
两个成员变量
private Map<String,HostAndWeight> hostAndWeights = null; key:hostname value:HostAndWeight(private String host;private long weight;)//hostname对应的文件大小
private long uniqueBlocksTotalWeight = 0;
getHDFSBlocksDistribution方法
遍历region下面所有StoreFile 得到每个storefile的getHDFSBlockDistribution放到返回的HDFSBlocksDistribution中
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | /** * This function will return the HDFS blocks distribution based on the data * captured when HFile is created * @return The HDFS blocks distribution for the region. */ public HDFSBlocksDistribution getHDFSBlocksDistribution() { HDFSBlocksDistribution hdfsBlocksDistribution = new HDFSBlocksDistribution(); synchronized ( this .stores) { //遍历所有StoreFile for (Store store : this .stores.values()) { for (StoreFile sf : store.getStorefiles()) { HDFSBlocksDistribution storeFileBlocksDistribution = sf.getHDFSBlockDistribution(); hdfsBlocksDistribution.add(storeFileBlocksDistribution); } } } return hdfsBlocksDistribution; } |
4.StoreFile 的getHDFSBlockDistribution方法(HRegionServer–>HRegion-->StoreFile)
1 2 3 4 5 6 7 8 9 | //StoreFile 的getHDFSBlockDistribution方法 /** * * @return the cached value of HDFS blocks distribution. The cached value is * calculated when store file is opened. */ public HDFSBlocksDistribution getHDFSBlockDistribution() { return this .fileInfo.getHDFSBlockDistribution(); } |
5.实际使用的是(StoreFileInfo)fileInfo.getHDFSBlockDistribution的信息
1 2 3 4 | /** @return the HDFS block distribution */ public HDFSBlocksDistribution getHDFSBlockDistribution() { return this .hdfsBlocksDistribution; } |
思考:
fileInfo的hdfsBlocksDistribution在哪里赋值的?
IDE里面在对应变量上面右键选择Find Usages(看哪里被用到了)
Value Read是读取不用看了
看Value Write 在哪里被赋值了,原来是StoreFileInfo的open方法!
1 2 3 4 5 6 | if ( this .reference != null ) { //reference是split产生的,hfile-link是snapshot产生的 hdfsBlocksDistribution = computeRefFileHDFSBlockDistribution(fs, reference, status); } else { //实际数据走这里 hdfsBlocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0 , length); } |
6.hdfsBlocksDistribution=FSUtils.computeHDFSBlocksDistribution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | /** * Compute HDFS blocks distribution of a given file, or a portion of the file * @param fs file system * @param status file status of the file * @param start start position of the portion * @param length length of the portion * @return The HDFS blocks distribution */ static public HDFSBlocksDistribution computeHDFSBlocksDistribution( final FileSystem fs, FileStatus status, long start, long length) throws IOException { HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution(); //fs.getFileBlockLocations是HDFS的api,获取文件对应的block块位置信息 BlockLocation [] blockLocations = fs.getFileBlockLocations(status, start, length); //遍历文件下所有block,获取block对应的N个副本位置——hosts数组 for (BlockLocation bl : blockLocations) { String [] hosts = bl.getHosts(); long len = bl.getLength(); //addHostsAndBlockWeight方法中将副本位置和数据大小放到unique weight和hostAndWeight中 blocksDistribution.addHostsAndBlockWeight(hosts, len); } return blocksDistribution; } |
7.addHostsAndBlockWeight
上面的blocksDistribution.addHostsAndBlockWeight(hosts, len);具体实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | /** * add some weight to a list of hosts, update the value of unique block weight * @param hosts the list of the host * @param weight the weight */ public void addHostsAndBlockWeight(String[] hosts, long weight) { if (hosts == null || hosts.length == 0 ) { // erroneous data return ; } addUniqueWeight(weight); for (String hostname : hosts) { addHostAndBlockWeight(hostname, weight); } } |
说明
uniqueBlocksTotalWeight是文件的逻辑大小
Map<String,HostAndWeight> hostAndWeights 是每个host上这个region对应的文件逻辑大小
(代码中其实遍历了三副本的hosts,但是因为这个Map的key是host,所以hostAndWeights中每个hostname下面某一文件只会有一份block(HDFS的三副本策略))
8.最后的getBlockLocalityIndex
截止到这里,只看了RS中调用的r.getHDFSBlocksDistribution().getBlockLocalityIndex 前半部分r.getHDFSBlocksDistribution
后面的getBlockLocalityIndex(serverName.getHostname());做了啥计算?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | /** * return the locality index of a given host * @param host the host name * @return the locality index of the given host */ public float getBlockLocalityIndex(String host) { float localityIndex = 0 ; //获取这个region当前这个Server上面的HostAndWeight HostAndWeight hostAndWeight = this .hostAndWeights.get(host); if (hostAndWeight != null && uniqueBlocksTotalWeight != 0 ) { //用这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小 localityIndex=( float )hostAndWeight.weight/( float )uniqueBlocksTotalWeight; } return localityIndex; } |
拓展
HDFS对HBase的影响至关重要,有精力可以多关注HDFS的api和特性
HDFS的BlockLocation都存了啥
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | /** * Represents the network location of a block, information about the hosts * that contain block replicas, and other block metadata (E.g. the file * offset associated with the block, length, whether it is corrupt, etc). */ @InterfaceAudience .Public @InterfaceStability .Stable public class BlockLocation { private String[] hosts; // Datanode hostnames private String[] cachedHosts; // Datanode hostnames with a cached replica private String[] names; // Datanode IP:xferPort for accessing the block private String[] topologyPaths; // Full path name in network topology private long offset; // Offset of the block in the file private long length; private boolean corrupt; |