The Delete command doesn’t delete the value immediately. Instead, it marks the
record for deletion. That is, a new “tombstone” record is written for that value, marking
it as deleted. The tombstone is used to indicate that the deleted value should no
longer be included in Get or Scan results. Because HFiles are immutable, it’s not until
a major compaction runs that these tombstone records are reconciled and space is truly
recovered from deleted records.
Compactions come in two flavors: minor and major. Both types result in a consolidation
of the data persisted in HFiles. A minor compaction folds HFiles together,
creating a larger HFile from multiple smaller HFiles, as shown in figure 2.3. Restricting
the number of HFiles is important for read performance, because all of them
must be referenced to read a complete row. During the compaction, HBase reads the
content of the existing HFiles, writing records into a new one. Then, it swaps in the
new HFile as the current active one and deletes the old ones that formed the new
one.2 HBase decides which HFiles to compact based on their number and relative
sizes. Minor compactions are designed to be minimally detrimental to HBase performance,
so there is an upper limit on the number of HFiles involved. All of these settings
When a compaction operates over all HFiles in a column family in a given region, it’s
called a major compaction. Upon completion of a major compaction, all HFiles in the
column family are merged into a single file. Major compactions can also be triggered for the entire table (or a particular region) manually from the shell.
This is a relatively expensive operation and isn’t done often. Minor compactions, on the other hand, are relatively lightweight and happen more frequently.
Major compactions are the only chance HBase has to clean up deleted records. Resolving a delete requires removing both the deleted record and the deletion marker. There’s no guarantee that both the
record and marker are in the same HFile. A major compaction is the only time when
HBase is guaranteed to have access to both of these entries at the same time.
The compaction process is described in greater detail, along with incremental
illustrations, in a post on the NGDATA blog.