2021SC@SDUSC hbase源码分析(九)HFile分析(1)
2021SC@SDUSC 2021SC@SDUSC
2021SC@SDUSC 2021SC@SDUSC
目录
HFile构成
文件主要分为四个部分:Scanned block section部分,Non-scanned block section部分,Load-on-open-section部分和Trailer部分。
-
Scanned block section:
顾名思义,表示顺序扫描HFile时所有的数据块将会被读取,包括Leaf Index Block、Data Block以及Bloom Block。其中Data Block中储存用户的KeyValue数据,Leaf Index Block中储存索引树的叶子节点数据,Bloom Block中存储布隆过滤器相关数据。
-
Non-scanned block section:
表示在HFile顺序扫描的时候数据不会被读取,主要包括Meta Block和Intermediate Level Data Index Blocks两部分。
-
Load-on-open-section:
这部分数据在HBase的region server启动时,需要加载到内存中。包括FileInfo、Bloom filter block、data block index和meta block index。
-
Trailer:
这部分主要记录了HFile的基本信息、各个部分的偏移值和寻址信息。
HFile物理数据
HFile物理结构图:
如上图,HFile会被切分为多个大小相等的block块,每个block的大小可以在创建表列簇的时候通过参数blocksize => ‘65535’进行指定,默认为64k,大号的Block有利于顺序Scan,小号Block利于随机查询,因而需要权衡。
HFile类中相关代码
介绍
我们可以看到HFile类中的介绍:
* <p>
* File is made of data blocks followed by meta data blocks (if any), a fileinfo
* block, data block index, meta data block index, and a fixed size trailer
* which records the offsets at which file changes content type.
* <pre><data blocks><meta blocks><fileinfo><
* data index><meta index><trailer></pre>
* Each block has a bit of magic at its start. Block are comprised of
* key/values. In data blocks, they are both byte arrays. Metadata blocks are
* a String key and a byte array value. An empty file looks like this:
* <pre><fileinfo><trailer></pre>. That is, there are not data nor meta
* blocks present.
* <p>
HFile中部分部分重要属性:
从上到下依次是:
- HFile中key的最大长度
- HFile支持的最小版本
- HFile支持的最大版本
public final static int MAXIMUM_KEY_LENGTH = Integer.MAX_VALUE;
public static final int MIN_FORMAT_VERSION = 2;
public static final int MAX_FORMAT_VERSION = 3;
HFile存储路径
同时在这个类中,我们可以得到HFile的存储路径为:ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE:
/**
* We assume that HFile path ends with
* ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this
* many levels of nesting. This is needed for identifying table and CF name
* from an HFile path.
*/
public final static int MIN_NUM_HFILE_PATH_LEVELS = 5;
判断格式
HFile中判断HFile格式的判断方法:
public static boolean isHFileFormat(final FileSystem fs, final FileStatus fileStatus)
throws IOException {
final Path path = fileStatus.getPath();
final long size = fileStatus.getLen();
try (FSDataInputStreamWrapper fsdis = new FSDataInputStreamWrapper(fs, path)) {
boolean isHBaseChecksum = fsdis.shouldUseHBaseChecksum();
assert !isHBaseChecksum; // Initially we must read with FS checksum.
FixedFileTrailer.readFromStream(fsdis.getStream(isHBaseChecksum), size);
return true;
} catch (IllegalArgumentException e) {
return false;
}
}
上图代码中的FileStatus类中相关属性,用于获得路径信息:
public class FileStatus implements Writable, Comparable<FileStatus> {
private Path path;
private long length;
private boolean isdir;
private short block_replication;
private long blocksize;
private long modification_time;
private long access_time;
private FsPermission permission;
private String owner;
private String group;
private Path symlink;
、、、
}
获得HFile路径集合
获得HFile文件位置的方法:
public static List<Path> getStoreFiles(FileSystem fs, Path regionDir)
throws IOException {
List<Path> regionHFiles = new ArrayList<>();
PathFilter dirFilter = new FSUtils.DirFilter(fs);
FileStatus[] familyDirs = fs.listStatus(regionDir, dirFilter);
for(FileStatus dir : familyDirs) {
FileStatus[] files = fs.listStatus(dir.getPath());
for (FileStatus file : files) {
if (!file.isDirectory() &&
(!file.getPath().toString().contains(HConstants.HREGION_OLDLOGDIR_NAME)) &&
(!file.getPath().toString().contains(HConstants.RECOVERED_EDITS_DIR))) {
regionHFiles.add(file.getPath());
}
}
}
return regionHFiles;
}
它返回一个路径集合,其中保存着HFile的路径。
HFile设置blocksize
HFile类中设置blocksize相关接口源码:
public interface CachingBlockReader {
HFileBlock readBlock(long offset, long onDiskBlockSize,
boolean cacheBlock, final boolean pread, final boolean isCompaction,
final boolean updateCacheMetrics, BlockType expectedBlockType,
DataBlockEncoding expectedDataBlockEncoding)
throws IOException;
}
写的API
Writer结构:
HFile类中的内部抽象接口Writer继承了Closeable, CellSink, ShipperListener类,作用是作为写的API
相关抽象类源码:
public interface Writer extends Closeable, CellSink, ShipperListener {
public static final byte [] MAX_MEMSTORE_TS_KEY = Bytes.toBytes("MAX_MEMSTORE_TS_KEY");
void appendFileInfo(byte[] key, byte[] value) throws IOException;
Path getPath();
void addInlineBlockWriter(InlineBlockWriter bloomWriter);
void appendMetaBlock(String bloomFilterMetaKey, Writable metaWriter);
void addGeneralBloomFilter(BloomFilterWriter bfw);
void addDeleteFamilyBloomFilter(BloomFilterWriter bfw) throws IOException;
HFileContext getFileContext();
}
创建Writer
方法中需要获取版本信息,从而根据不同的不同版本会执行不同的方法
public static final WriterFactory getWriterFactory(Configuration conf,
CacheConfig cacheConf) {
int version = getFormatVersion(conf);
switch (version) {
case 2:
throw new IllegalArgumentException("This should never happen. " +
"Did you change hfile.format.version to read v2? This version of the software writes v3" +
" hfiles only (but it can read v2 files without having to update hfile.format.version " +
"in hbase-site.xml)");
case 3:
return new HFile.WriterFactory(conf, cacheConf);
default:
throw new IllegalArgumentException("Cannot create writer for HFile " +
"format version " + version);
}
}
客户端读
Reader结构:
HFile中的内部抽象接口继承了Closeable, CachingBlockReader,它的作用是client用来打开或迭代HFile
public interface Reader extends Closeable, CachingBlockReader {
String getName();
CellComparator getComparator();
HFileScanner getScanner(boolean cacheBlocks, final boolean pread, final boolean isCompaction);
HFileBlock getMetaBlock(String metaBlockName, boolean cacheBlock) throws IOException;
Optional<Cell> getLastKey();
Optional<Cell> midKey() throws IOException;
long length();
long getEntries();
Optional<Cell> getFirstKey();
long indexSize();
Optional<byte[]> getFirstRowKey();
Optional<byte[]> getLastRowKey();
FixedFileTrailer getTrailer();
void setDataBlockIndexReader(HFileBlockIndex.CellBasedKeyBlockIndexReader reader);
HFileBlockIndex.CellBasedKeyBlockIndexReader getDataBlockIndexReader();
void setMetaBlockIndexReader(HFileBlockIndex.ByteArrayKeyBlockIndexReader reader);
HFileBlockIndex.ByteArrayKeyBlockIndexReader getMetaBlockIndexReader();
HFileScanner getScanner(boolean cacheBlocks, boolean pread);
DataInput getGeneralBloomFilterMetadata() throws IOException;
DataInput getDeleteBloomFilterMetadata() throws IOException;
Path getPath();
void close(boolean evictOnClose) throws IOException;
DataBlockEncoding getDataBlockEncoding();
boolean hasMVCCInfo();
HFileContext getFileContext();
boolean isPrimaryReplicaReader();
DataBlockEncoding getEffectiveEncodingInCache(boolean isCompaction);
@VisibleForTesting
HFileBlock.FSReader getUncachedBlockReader();
@VisibleForTesting
boolean prefetchComplete();
void unbufferStream();
ReaderContext getContext();
HFileInfo getHFileInfo();
void setDataBlockEncoder(HFileDataBlockEncoder dataBlockEncoder);
}
获取reader:
public static Reader createReader(ReaderContext context, HFileInfo fileInfo,
CacheConfig cacheConf, Configuration conf) throws IOException {
try {
if (context.getReaderType() == ReaderType.STREAM) {
return new HFileStreamReader(context, fileInfo, cacheConf, conf);
}
FixedFileTrailer trailer = fileInfo.getTrailer();
switch (trailer.getMajorVersion()) {
case 2:
LOG.debug("Opening HFile v2 with v3 reader");
// Fall through. FindBugs: SF_SWITCH_FALLTHROUGH
case 3:
return new HFilePreadReader(context, fileInfo, cacheConf, conf);
default:
throw new IllegalArgumentException("Invalid HFile version " + trailer.getMajorVersion());
}
} catch (Throwable t) {
IOUtils.closeQuietly(context.getInputStreamWrapper());
throw new CorruptHFileException("Problem reading HFile Trailer from file "
+ context.getFilePath(), t);
} finally {
context.getInputStreamWrapper().unbuffer();
}
}
与创建Writer不同的是:获取reader无需指定version,因为version信息已经在HFile的trailer