2021SC@SDUSC hbase代码分析(九)HFile分析(1)

2021SC@SDUSC hbase源码分析(九)HFile分析(1)

2021SC@SDUSC 2021SC@SDUSC
2021SC@SDUSC 2021SC@SDUSC

HFile构成

文件主要分为四个部分:Scanned block section部分Non-scanned block section部分Load-on-open-section部分Trailer部分

  1. Scanned block section:

    顾名思义,表示顺序扫描HFile时所有的数据块将会被读取,包括Leaf Index Block、Data Block以及Bloom Block。其中Data Block中储存用户的KeyValue数据,Leaf Index Block中储存索引树的叶子节点数据,Bloom Block中存储布隆过滤器相关数据。

  2. Non-scanned block section:

    表示在HFile顺序扫描的时候数据不会被读取,主要包括Meta Block和Intermediate Level Data Index Blocks两部分。

  3. Load-on-open-section:

    这部分数据在HBase的region server启动时,需要加载到内存中。包括FileInfo、Bloom filter block、data block index和meta block index。

  4. Trailer:

    这部分主要记录了HFile的基本信息、各个部分的偏移值和寻址信息。

HFile物理数据

HFile物理结构图:
在这里插入图片描述

如上图,HFile会被切分为多个大小相等的block块,每个block的大小可以在创建表列簇的时候通过参数blocksize => ‘65535’进行指定,默认为64k,大号的Block有利于顺序Scan,小号Block利于随机查询,因而需要权衡。

HFile类中相关代码

介绍

我们可以看到HFile类中的介绍:

* <p>
* File is made of data blocks followed by meta data blocks (if any), a fileinfo
* block, data block index, meta data block index, and a fixed size trailer
* which records the offsets at which file changes content type.
* <pre>&lt;data blocks&gt;&lt;meta blocks&gt;&lt;fileinfo&gt;&lt;
* data index&gt;&lt;meta index&gt;&lt;trailer&gt;</pre>
* Each block has a bit of magic at its start.  Block are comprised of
* key/values.  In data blocks, they are both byte arrays.  Metadata blocks are
* a String key and a byte array value.  An empty file looks like this:
* <pre>&lt;fileinfo&gt;&lt;trailer&gt;</pre>.  That is, there are not data nor meta
* blocks present.
* <p>

HFile中部分部分重要属性:

从上到下依次是:

  1. HFile中key的最大长度
  2. HFile支持的最小版本
  3. HFile支持的最大版本
public final static int MAXIMUM_KEY_LENGTH = Integer.MAX_VALUE;
public static final int MIN_FORMAT_VERSION = 2;
public static final int MAX_FORMAT_VERSION = 3;
HFile存储路径

同时在这个类中,我们可以得到HFile的存储路径为:ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE:

/**
 * We assume that HFile path ends with
 * ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this
 * many levels of nesting. This is needed for identifying table and CF name
 * from an HFile path.
 */
public final static int MIN_NUM_HFILE_PATH_LEVELS = 5;
判断格式

HFile中判断HFile格式的判断方法:

public static boolean isHFileFormat(final FileSystem fs, final FileStatus fileStatus)
    throws IOException {
  final Path path = fileStatus.getPath();
  final long size = fileStatus.getLen();
  try (FSDataInputStreamWrapper fsdis = new FSDataInputStreamWrapper(fs, path)) {
    boolean isHBaseChecksum = fsdis.shouldUseHBaseChecksum();
    assert !isHBaseChecksum; // Initially we must read with FS checksum.
    FixedFileTrailer.readFromStream(fsdis.getStream(isHBaseChecksum), size);
    return true;
  } catch (IllegalArgumentException e) {
    return false;
  }
}

上图代码中的FileStatus类中相关属性,用于获得路径信息:

public class FileStatus implements Writable, Comparable<FileStatus> {
    private Path path;
    private long length;
    private boolean isdir;
    private short block_replication;
    private long blocksize;
    private long modification_time;
    private long access_time;
    private FsPermission permission;
    private String owner;
    private String group;
    private Path symlink;
    、、、
 }
获得HFile路径集合

获得HFile文件位置的方法:

public static List<Path> getStoreFiles(FileSystem fs, Path regionDir)
    throws IOException {
  List<Path> regionHFiles = new ArrayList<>();
  PathFilter dirFilter = new FSUtils.DirFilter(fs);
  FileStatus[] familyDirs = fs.listStatus(regionDir, dirFilter);
  for(FileStatus dir : familyDirs) {
    FileStatus[] files = fs.listStatus(dir.getPath());
    for (FileStatus file : files) {
      if (!file.isDirectory() &&
          (!file.getPath().toString().contains(HConstants.HREGION_OLDLOGDIR_NAME)) &&
          (!file.getPath().toString().contains(HConstants.RECOVERED_EDITS_DIR))) {
        regionHFiles.add(file.getPath());
      }
    }
  }
  return regionHFiles;
}

它返回一个路径集合,其中保存着HFile的路径。

HFile设置blocksize

HFile类中设置blocksize相关接口源码:

  public interface CachingBlockReader {
    
    HFileBlock readBlock(long offset, long onDiskBlockSize,
        boolean cacheBlock, final boolean pread, final boolean isCompaction,
        final boolean updateCacheMetrics, BlockType expectedBlockType,
        DataBlockEncoding expectedDataBlockEncoding)
        throws IOException;
  }

写的API

Writer结构:
在这里插入图片描述

HFile类中的内部抽象接口Writer继承了Closeable, CellSink, ShipperListener类,作用是作为写的API

相关抽象类源码:

public interface Writer extends Closeable, CellSink, ShipperListener {
    
  public static final byte [] MAX_MEMSTORE_TS_KEY = Bytes.toBytes("MAX_MEMSTORE_TS_KEY");
  void appendFileInfo(byte[] key, byte[] value) throws IOException;
  Path getPath();
  void addInlineBlockWriter(InlineBlockWriter bloomWriter);
  void appendMetaBlock(String bloomFilterMetaKey, Writable metaWriter);
  void addGeneralBloomFilter(BloomFilterWriter bfw);
  void addDeleteFamilyBloomFilter(BloomFilterWriter bfw) throws IOException;

  HFileContext getFileContext();
}
创建Writer

方法中需要获取版本信息,从而根据不同的不同版本会执行不同的方法

public static final WriterFactory getWriterFactory(Configuration conf,
    CacheConfig cacheConf) {
  int version = getFormatVersion(conf);
  switch (version) {
    case 2:
      throw new IllegalArgumentException("This should never happen. " +
        "Did you change hfile.format.version to read v2? This version of the software writes v3" +
        " hfiles only (but it can read v2 files without having to update hfile.format.version " +
        "in hbase-site.xml)");
    case 3:
      return new HFile.WriterFactory(conf, cacheConf);
    default:
      throw new IllegalArgumentException("Cannot create writer for HFile " +
          "format version " + version);
  }
}

客户端读

Reader结构:

在这里插入图片描述

HFile中的内部抽象接口继承了Closeable, CachingBlockReader,它的作用是client用来打开或迭代HFile

public interface Reader extends Closeable, CachingBlockReader {

  String getName();

  CellComparator getComparator();

  HFileScanner getScanner(boolean cacheBlocks, final boolean pread, final boolean isCompaction);

  HFileBlock getMetaBlock(String metaBlockName, boolean cacheBlock) throws IOException;

  Optional<Cell> getLastKey();

  Optional<Cell> midKey() throws IOException;

  long length();

  long getEntries();

  Optional<Cell> getFirstKey();

  long indexSize();

  Optional<byte[]> getFirstRowKey();

  Optional<byte[]> getLastRowKey();

  FixedFileTrailer getTrailer();

  void setDataBlockIndexReader(HFileBlockIndex.CellBasedKeyBlockIndexReader reader);
  HFileBlockIndex.CellBasedKeyBlockIndexReader getDataBlockIndexReader();

  void setMetaBlockIndexReader(HFileBlockIndex.ByteArrayKeyBlockIndexReader reader);
  HFileBlockIndex.ByteArrayKeyBlockIndexReader getMetaBlockIndexReader();

  HFileScanner getScanner(boolean cacheBlocks, boolean pread);

  DataInput getGeneralBloomFilterMetadata() throws IOException;

  DataInput getDeleteBloomFilterMetadata() throws IOException;

  Path getPath();

  void close(boolean evictOnClose) throws IOException;

  DataBlockEncoding getDataBlockEncoding();

  boolean hasMVCCInfo();

  HFileContext getFileContext();

  boolean isPrimaryReplicaReader();

  DataBlockEncoding getEffectiveEncodingInCache(boolean isCompaction);

  @VisibleForTesting
  HFileBlock.FSReader getUncachedBlockReader();

  @VisibleForTesting
  boolean prefetchComplete();

  void unbufferStream();

  ReaderContext getContext();
  HFileInfo getHFileInfo();
  void setDataBlockEncoder(HFileDataBlockEncoder dataBlockEncoder);
}
获取reader:
public static Reader createReader(ReaderContext context, HFileInfo fileInfo,
    CacheConfig cacheConf, Configuration conf) throws IOException {
  try {
    if (context.getReaderType() == ReaderType.STREAM) {
      return new HFileStreamReader(context, fileInfo, cacheConf, conf);
    }
    FixedFileTrailer trailer = fileInfo.getTrailer();
    switch (trailer.getMajorVersion()) {
      case 2:
        LOG.debug("Opening HFile v2 with v3 reader");
        // Fall through. FindBugs: SF_SWITCH_FALLTHROUGH
      case 3:
        return new HFilePreadReader(context, fileInfo, cacheConf, conf);
      default:
        throw new IllegalArgumentException("Invalid HFile version " + trailer.getMajorVersion());
    }
  } catch (Throwable t) {
    IOUtils.closeQuietly(context.getInputStreamWrapper());
    throw new CorruptHFileException("Problem reading HFile Trailer from file "
        + context.getFilePath(), t);
  } finally {
    context.getInputStreamWrapper().unbuffer();
  }
}

与创建Writer不同的是:获取reader无需指定version,因为version信息已经在HFile的trailer

HFile类的相关代码分析完毕

未完待续

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值