http://hbase.apache.org/book/regions.arch.html#compaction
http://hbase.apache.org/book.html
摘:
9.7.5.5.1. Compaction File Selection
To understand the core algorithm for StoreFile selection, there is some ASCII-art in the Store source code that will serve as useful reference. It has been copied below:
/* normal skew: * * older ----> newer * _ * | | _ * | | | | _ * --|-|- |-|- |-|---_-------_------- minCompactSize * | | | | | | | | _ | | * | | | | | | | | | | | | * | | | | | | | | | | | | */
Important knobs:
-
hbase.store.compaction.ratio
Ratio used in compaction file selection algorithm (default 1.2f). -
hbase.hstore.compaction.min
(.90 hbase.hstore.compactionThreshold) (files) Minimum number of StoreFiles per Store to be selected for a compaction to occur (default 2). -
hbase.hstore.compaction.max
(files) Maximum number of StoreFiles to compact per minor compaction (default 10). -
hbase.hstore.compaction.min.size
(bytes) Any StoreFile smaller than this setting with automatically be a candidate for compaction. Defaults tohbase.hregion.memstore.flush.size
(128 mb). -
hbase.hstore.compaction.max.size
(.92) (bytes) Any StoreFile larger than this setting with automatically be excluded from compaction (default Long.MAX_VALUE).
The minor compaction StoreFile selection logic is size based, and selects a file for compaction when the file <= sum(smaller_files) *hbase.hstore.compaction.ratio
.
This example mirrors an example from the unit test TestCompactSelection
.
-
hbase.store.compaction.ratio
= 1.0f -
hbase.hstore.compaction.min
= 3 (files) -
hbase.hstore.compaction.max
= 5 (files) -
hbase.hstore.compaction.min.size
= 10 (bytes) -
hbase.hstore.compaction.max.size
= 1000 (bytes)
The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
Why?
- 100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
- 50 --> No, because sum(23, 12, 12) * 1.0 = 47.
- 23 --> Yes, because sum(12, 12) * 1.0 = 24.
- 12 --> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
- 12 --> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.
This example mirrors an example from the unit test TestCompactSelection
.
-
hbase.store.compaction.ratio
= 1.0f -
hbase.hstore.compaction.min
= 3 (files) -
hbase.hstore.compaction.max
= 5 (files) -
hbase.hstore.compaction.min.size
= 10 (bytes) -
hbase.hstore.compaction.max.size
= 1000 (bytes)
The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
Why?
- 100 --> No, because sum(25, 12, 12) * 1.0 = 47
- 25 --> No, because sum(12, 12) * 1.0 = 24
- 12 --> No. Candidate because sum(12) * 1.0 = 12, there are only 2 files to compact and that is less than the threshold of 3
- 12 --> No. Candidate because the previous StoreFile was, but there are not enough files to compact
9.7.5.5.4. Minor Compaction File Selection - Example #3 (Limiting Files To Compact)
This example mirrors an example from the unit test TestCompactSelection
.
-
hbase.store.compaction.ratio
= 1.0f -
hbase.hstore.compaction.min
= 3 (files) -
hbase.hstore.compaction.max
= 5 (files) -
hbase.hstore.compaction.min.size
= 10 (bytes) -
hbase.hstore.compaction.max.size
= 1000 (bytes)
The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 7, 6, 5, 4, 3.
Why?
- 7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21. Also, 7 is less than the min-size
- 6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15. Also, 6 is less than the min-size.
- 5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10. Also, 5 is less than the min-size.
- 4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4 is less than the min-size.
- 3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is less than the min-size.
- 2 --> No. Candidate because previous file was selected and 2 is less than the min-size, but the max-number of files to compact has been reached.
- 1 --> No. Candidate because previous file was selected and 1 is less than the min-size, but max-number of files to compact has been reached.
hbase.store.compaction.ratio
. A large ratio (e.g., 10) will produce a single giant file. Conversely, a value of .25 will produce behavior similar to the BigTable compaction algorithm - resulting in 4 StoreFiles.
hbase.hstore.compaction.min.size
. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be adjusted downwards in write-heavy environments where many 1 or 2 mb StoreFiles are being flushed, because every file will be targeted for compaction and the resulting files may still be under the min-size and require further compaction, etc.