基于hbase jira读源代码

最新推荐文章于 2021-11-30 16:42:35 发布

macyang

最新推荐文章于 2021-11-30 16:42:35 发布

阅读量2.6k

点赞数

分类专栏： hbase 文章标签： hbase optimization compression statistics delete cache

本文链接：https://blog.csdn.net/macyang/article/details/7075147

版权

hbase 专栏收录该内容

51 篇文章 0 订阅

订阅专栏

最近痴迷于hbase jira，索性摘录些比较有意思的jira吧！通过jira中描述的问题，然后读hbase源代码。更多可以参考： https://issues.apache.org/jira/browse/HBASE

- HBASE-3287: Add option to cache blocks on hfile write and evict blocks on hfile close

Introduces two new configuration parameters: hbase.rs.cacheblocksonwrite (default: false) which will pre-cache all blocks of a file into the block cache as it is written, and hbase.rs.evictblocksonclose (default: true) which will evict all blocks of a file from the block cache when a file is closed on a RS.

Des:

This issue is about adding configuration options to add/remove from the block cache when creating/closing files. For use cases with lots of flushing and compacting, this might be desirable to prevent cache misses and maximize the effective utilization of total block cache capacity.

The first option, hbase.rs.cacheblocksonwrite, will make it so we pre-cache blocks as we are writing out new files.

The second option, hbase.rs.evictblocksonclose, will make it so we evict blocks when files are closed.

- HBASE-4463: Run more aggressive compactions during off peak hours

Des:

The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

- HBASE-1476: scaling compaction with multiple threads

Des:

Was thinking we should build in support to be able to handle more then one thread for compactions this will allow us to keep up with compactions when we get to the point where we store Tb's of data per node and may regions
Maybe a configurable setting to set how many threads a region server can use for compactions.

With compression turned on my compactions are limited by cpu speed with multi cores then it would be nice to be able to scale compactions to 2 or more cores.

- HBASE-4465: Lazy-seek optimization for StoreFile scanners

Des:

Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

- HBASE-4469: Avoid top row seek by looking up ROWCOL bloomfilter

Des:

The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

- HBASE-4532: Avoid top row seek by dedicated bloom filter for delete family bloom filter

Des:

The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled.
This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty column.

For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.HBASE-4469

================================================

After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings for ALL kinds of bloom filter.

- HBASE-1364: Distributed splitting of regionserver commit logs

Des:

In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting.

1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error.
2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits.

- HBASE-4768: Per-(table, columnFamily) metrics with configurable table name inclusion

As we kept adding more granular block read and block cache usage statistics, a combinatorial explosion of various cases to monitor started to happen, especially when we wanted both per-table/column family/block type statistics and aggregate statistics on various subsets of these dimensions. Here, we un-clutters HFile readers, LruBlockCache, StoreFile, etc. by creating a centralized class that knows how to update all kinds of per-table/CF/block type counters.

Table name and column family configuration have been pushed to a base class, SchemaConfigured. This is convenient as many of existing classes that have these properties (HFile readers/writers, HFile blocks, etc.) did not have a base class. Whether to collect per-(table, columnFamily) or per-columnFamily only metrics can be configured with the hbase.metrics.showTableName configuration key. We don't expect this configuration to change at runtime, so we cache the setting statically and log a warning when an attempt is made to flip it once already set. This way we don't have to pass configuration to a lot more places, e.g. everywhere an HFile reader is instantiated.

Thanks to Liyin for his initial version of per-table metrics patch and a lot of valuable feedback

- HBASE-4117: Slow Query Log

Des:

Produce log messages for slow queries. The RPC server will decide what is slow based on a configurable "warn response time" parameter. Queries designated as slow will then output a "response too slow" message followed by a fingerprint of the query, and a summary limited in size by another configurable parameter (to limit log spamming).

Release Note:

Exposes JSON-parseable fingerprint and details for queries that take longer than a configurable threshold time. The exposure is currently to the main regionserver log, along with a (queryTooSlow) tag which allows it to be grepped out and easily aggregated and/or monitored in adminstrator scripts.

The patch also provides a standard way to extract fingerprint and detail information of interest by requiring each "DatabaseCommand" to provide a fingerprint map and a details map, which will be a superset of the fingerprint.

- HBASE-57: Master should allocate regions to regionservers based upon data locality and rack awareness

Currently, regions are assigned regionservers based off a basic loading attribute. A factor to include in the assignment calcuation is the location of the region in hdfs; i.e. servers hosting region replicas. If the cluster is such that regionservers are being run on the same nodes as those running hdfs, then ideally the regionserver for a particular region should be running on the same server as hosts a region replica.

- HBase-4218: Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

The details on a patch in progress for prefix compression on row keys:
Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms,
It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter.