在HBase Snapshot出现之前,备份或者克隆table必须使用 Copy/Export Table工具,或者是 disable table,然后拷贝所有的HFiles。前者起MapReduce作业对RegionServer的压力太大,后者需要disable掉table,这就阻塞了读写。
而Snapshot就可以允许admin在没有数据copy和对RS影响很小的情况下clone一个table。Export 一个snapshot到另外的一个cluster不会影响RS的性能。
HBase Snapshot的主要运用场景如下:
- Recovery from user/application errors
- Restore/Recover from a known safe state.
- View previous snapshots and selectively merge the difference into production
- Save a snapshot right before a major application upgrade or change.
- Auditing and/or reporting on views of data at specific time
- Capture monthly data for compliance purposes.
- Run end-of-day/month/quarter reports.
- Application testing
- Test schema or application changes on data similar to that in production from a snapshot and then throw it away. For example: take a snapshot, create a new table from the snapshot content (schema plus data), and manipulate the new table by changing the schema, adding and removing rows, and so on. (The original table, the snapshot, and the new table remain mutually independent.)
- Offloading of work
- Take a snapshot, export it to another cluster, and run your MapReduce jobs. Since the export snapshot operates at HDFS level, you don’t slow down your main HBase cluster as much as CopyTable does.
- Recovery from user/application errors
什么是Snapshot
Snapshot就是一个metadata info集合,它能够让admin将一个table回复到先前的的一个状态。
Operations:
- Take a snapshot: 对一个指定的table创建snapshot,在table进行balance,split,compact时,可能会失败;
- Clone a snapshot: 基于上述创建的snapshot,创建一个新的table,该table和上述的table有相同的schema和data, 新表的操作不会影响原始表;
- Restore a snapshot: 将一个table回复到一个snapshot状态;
- Delete a snapshot: 删除一个snapshot,释放空间,不会影响clone的表和其他的snapshot;
- Export a snapshot: 将一个snapshot的metadata和data copy到另一个集群中,HDFS层面的操作,不会影响RS;
- 具体操作:
-
hbase> snapshot ‘tableName’, ‘snapshotName’
hbase> clone_snapshot 'snapshotName', 'newTableName'
hbase> delete_snapshot 'snapshotName'
hbase> restore_snapshot 'snapshotName'
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot SnapshotName -copy-to hdfs:///srv2:8082/hbase
一些局限
- 涉及到snapshot的region合并时,在snapshot和clone table中会丢失数据
- 一个带有replication属性的table进行恢复到一个snapshot状态时,该table在另外一個集群里replica不会进行恢复
PS:原文链接