hbase shell操作命令大全

小强签名设计

已于 2024-04-15 16:10:30 修改

阅读量1.1w

点赞数

分类专栏： hbase 文章标签： hbase shell 操作命令

于 2017-06-23 12:02:39 首次发布

本文链接：https://blog.csdn.net/m0_37739193/article/details/73618899

版权

hbase 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

文章目录

一、hbase web操作

访问地址 http://h71:60010

注：h71的ip配置在$HBASE_HOME/conf/hbase-site.xml中

hbase.master.info.port
HBase Master web 界面端口. 设置为 -1 意味着你不想让它运行
默认: 60010
注：新版本改为16010了，所以得访问http://h71:16010了

hbase.master.info.bindAddress
HBase Master web 界面绑定的IP地址
默认: 0.0.0.0

ip映射成主机名：
linux：在/etc/hosts中配置
windows：在windows系统中的C:\Windows\System32\drivers\etc目录下的hosts文件中配置

192.168.8.71    h71
192.168.8.72    h72
192.168.8.73    h73

二、hbase shell 基本操作

1.进入hbase shell console：

注：如果有kerberos认证，需要事先使用相应的keytab进行一下认证（使用kinit命令），认证成功之后再使用hbase shell进入可以使用whoami命令可查看当前用户。

$HBASE_HOME/bin/hbase shell
hbase(main):029:0> whoami
hadoop (auth:SIMPLE)
    groups: hadoop
    
hbase(main):008:0> version
1.0.0-cdh5.5.2, rUnknown, Mon Jan 25 16:33:02 PST 2016

list    //看库中所有表
status  //查看当前运行服务器状态
exits '表名字' //判断表存在

2.命名空间Namespace：

在关系数据库系统中，命名空间namespace指的是一个表的逻辑分组，同一组中的表有类似的用途。命名空间的概念为即将到来的多租户特性打下基础：

配额管理（Quota Management (HBASE-8410)）：限制一个namespace可以使用的资源，资源包括region和table等；
命名空间安全管理（Namespace Security Administration (HBASE-9206)）：提供了另一个层面的多租户安全管理；
Region服务器组（Region server groups (HBASE-6721)）：一个命名空间或一张表，可以被固定到一组regionservers上，从而保证了数据隔离性。

（1）命名空间管理：

命名空间可以被创建、移除、修改。表和命名空间的隶属关系在在创建表时决定，通过以下格式指定：<namespace>:<table>

Example：hbase shell中创建命名空间、创建命名空间中的表、移除命名空间、修改命名空间：

#Create a namespace
create_namespace 'my_ns'

#create my_table in my_ns namespace
create 'my_ns:my_table', 'fam'

#drop namespace
drop_namespace 'my_ns'
注意：只有当该空间不存在任何表为空的时候才可以删除，如果存在表的话应该将表删除后再删除该空间，删除表的操作：
hbase(main):005:0> disable 'my_ns:my_table'
hbase(main):006:0> drop 'my_ns:my_table'

#alter namespace
alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}

# 列出所有namespace
list_namespace

# 查看namespace
hbase(main):005:0> describe_namespace 'hbase'
DESCRIPTION                                                                                                                                                                                    
{NAME => 'hbase'}                                                                                                                                                                              
Took 0.0206 seconds                                                                                                                                                                            
=> 1

（2）预定义的命名空间：

有两个系统内置的预定义命名空间：

hbase：系统命名空间，用于包含hbase的内部表
default：所有未指定命名空间的表都自动进入该命名空间

#使用默认的命名空间：namespace=my_ns and table qualifier=bar
create 'my_ns:bar', 'fam'

#指定命名空间：namespace=default and table qualifier=bar
create 'bar', 'fam'

3.创建表：

语法：create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}

具体命令：

create 't1', {NAME => 'f1', VERSIONS => 5}
create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}

省略模式建立列族：create 't1', 'f1', 'f2', 'f3'

指定每个列族参数：

create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}

设置不同参数，提升表的读取性能：

create 'lmj_test',
        {NAME => 'adn', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', TTL => '15768000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
        {NAME => 'fixeddim', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', TTL => '15768000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
        {NAME => 'social', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '1', TTL => '15768000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}

每个参数属性都有性能意义，通过合理化的设置可以提升表的性能：

create 'lmj_test',
        {NAME => 'adn', BLOOMFILTER => 'ROWCOL', VERSIONS => '1', TTL => '15768000', MIN_VERSIONS => '0', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'false'},
        {NAME => 'fixeddim',BLOOMFILTER => 'ROWCOL', VERSIONS => '1', TTL => '15768000', MIN_VERSIONS => '0', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'false'},
        {NAME => 'social',BLOOMFILTER => 'ROWCOL', VERSIONS => '1', TTL => '15768000', MIN_VERSIONS => '0',COMPRESSION => 'SNAPPY', BLOCKCACHE => 'false'}

4.修改存储版本数及版本号查询：

参考：HBase中修改存储版本数及版本号查询

Hbase中通过row和columns确定的为一个存贮单元称为cell，每个 cell都保存着同一份数据的多个版本。在默认的情况下，HBase会存储三个版本的历史数据。但是在实际应用中，出于性能或业务需要，我们可能只有一个或其他数量的版本需要存储。那么如何修改这一默认配置呢？

（1）建表时配置：

如果你还没有建表，那你可以在建表时指定VERSIONS来设定版本号，就是存储几个版本的数据。

create '表名',{NAME='列族名1',VERSIONS=给定一个版本号},{NAME='列族名2',VERSIONS=给定的版本号}

（2）修改表配置：

如果在建表时没有指定版本号，那么就需要按照以下步骤修改表配置。

在表已经建好的情况下，需要首先将表下线：

disable 'table'

修改表属性(可指定对某个列族修改)：

alter 'table' , NAME => 'f', VERSIONS => 1

重新上线（enable）：

enable 'table'

（3）版本号查询：

根据版本号我们可以指定查询几个版本的数据，目前该表的VERSIONS为10：

hbase(main):060:0> scan 'test_schema1:t2'
ROW                                               COLUMN+CELL                                                                                                                                      
0 row(s)
Took 0.1183 seconds                                                                                                                                                                                
hbase(main):061:0> put 'test_schema1:t2','101','F:b','huiqtest1'
Took 0.0050 seconds                                                                                                                                                                                
hbase(main):062:0> put 'test_schema1:t2','101','F:b','huiqtest2'
Took 0.0046 seconds                                                                                                                                                                                
hbase(main):063:0> put 'test_schema1:t2','101','F:b','huiqtest3'
Took 0.0157 seconds                                                                                                                                                                                
hbase(main):064:0> scan 'test_schema1:t2'
ROW                                               COLUMN+CELL                                                                                                                                      
 101                                              column=F:b, timestamp=1627353050875, value=huiqtest3                                                                                             
1 row(s)
Took 0.0048 seconds                                                                                                                                                                                
hbase(main):065:0> scan 'test_schema1:t2', {VERSIONS=>3}
ROW                                               COLUMN+CELL                                                                                                                                      
 101                                              column=F:b, timestamp=1627353050875, value=huiqtest3                                                                                             
 101                                              column=F:b, timestamp=1627353048782, value=huiqtest2                                                                                             
 101                                              column=F:b, timestamp=1627353045389, value=huiqtest1                                                                                             
1 row(s)
Took 0.0097 seconds                                                                                                                                                                                
hbase(main):066:0> scan 'test_schema1:t2', {COLUMNS => ['F:a', 'F:b'], VERSIONS=>3}
ROW                                               COLUMN+CELL                                                                                                                                      
 101                                              column=F:b, timestamp=1627353050875, value=huiqtest3                                                                                             
 101                                              column=F:b, timestamp=1627353048782, value=huiqtest2                                                                                             
 101                                              column=F:b, timestamp=1627353045389, value=huiqtest1                                                                                             
1 row(s)
Took 0.0088 seconds                                                                                                                                                                                                                       
hbase(main):068:0> get 'test_schema1:t2','101','F:b'
COLUMN                                            CELL                                                                                                                                             
 F:b                                              timestamp=1627353050875, value=huiqtest3                                                                                                         
1 row(s)
Took 0.0154 seconds                                                                                                                                                                                
hbase(main):069:0> get 'test_schema1:t2','101', {COLUMNS => ['F:b'], VERSIONS=>3}
COLUMN                                            CELL                                                                                                                                             
 F:b                                              timestamp=1627353050875, value=huiqtest3                                                                                                         
 F:b                                              timestamp=1627353048782, value=huiqtest2                                                                                                         
 F:b                                              timestamp=1627353045389, value=huiqtest1                                                                                                         
1 row(s)
Took 0.0163 seconds                                                                                                                                                                                
hbase(main):070:0> get 'test_schema1:t2','101', {COLUMNS => 'F:b', VERSIONS=>3}
COLUMN                                            CELL                                                                                                                                             
 F:b                                              timestamp=1627353050875, value=huiqtest3                                                                                                         
 F:b                                              timestamp=1627353048782, value=huiqtest2                                                                                                         
 F:b                                              timestamp=1627353045389, value=huiqtest1                                                                                                         
1 row(s)
Took 0.0044 seconds             
hbase(main):073:0> put 'test_schema1:t2','101','F:a','101'
Took 0.0660 seconds            
hbase(main):077:0> get 'test_schema1:t2','101', {COLUMNS => ['F:a', 'F:b'], VERSIONS=>2}
COLUMN                                            CELL                                                                                                                                             
 F:a                                              timestamp=1627353603902, value=101                                                                                                               
 F:b                                              timestamp=1627353050875, value=huiqtest3                                                                                                         
 F:b                                              timestamp=1627353048782, value=huiqtest2                                                                                                         
1 row(s)
Took 0.0053 seconds       

# 删除指定版本的数据
hbase(main):078:0> delete 'test_schema1:t2','101','F:b',1627353048782
Took 0.0136 seconds                                                                                                                                                                                
hbase(main):079:0> get 'test_schema1:t2','101', {COLUMNS => ['F:a', 'F:b'], VERSIONS=>2}
COLUMN                                            CELL                                                                                                                                             
 F:a                                              timestamp=1627353603902, value=101                                                                                                               
 F:b                                              timestamp=1627353050875, value=huiqtest3                                                                                                         
 F:b                                              timestamp=1627353045389, value=huiqtest1                                                                                                         
1 row(s)
Took 0.0115 seconds

5.在创建的表中插入数据：

hbase(main):180:0> put 'scores','zhangsan01','course:math','99'
hbase(main):181:0> put 'scores','zhangsan01','course:art','90'
hbase(main):182:0> put 'scores','zhangsan01','grade:','101'
hbase(main):184:0> put 'scores','zhangsan02','course:math','66'
hbase(main):185:0> put 'scores','zhangsan02','course:art','60'
hbase(main):186:0> put 'scores','zhangsan02','grade:','102'
hbase(main):201:0> put 'scores','lisi01','course:math','89'
hbase(main):202:0> put 'scores','lisi01','course:art','89'
hbase(main):203:0> put 'scores','lisi01','grade:','201'

6.更新数据：

更新表数据与插入表数据一样，都使用put命令，如下：

# 语法：
put 'tablename','row','colfamily:colname','newvalue'

# 更新emp表中row为1，将列为personal data:city的值更改为bj
put 'emp','1','personal data:city','bj'

7.复制表：

如何在hbase里面复制出一张表呢？用快照复制：

步骤1：创建表的快照

hbase(main):204:0> snapshot 'scores' , 'snapshot_scores'

步骤2：从快照克隆出一张新的表

hbase(main):205:0> clone_snapshot 'snapshot_scores','scores_2'

如果加表空间的话：

hbase(main):206:0> snapshot 'test_schema1:t1', 'snapshot_t1'
hbase(main):207:0> clone_snapshot 'snapshot_t1','test_schema1:t2'

8.查看删除快照：

# 查看快照
hbase(main):208:0> list_snapshots
SNAPSHOT                                         TABLE + CREATION TIME                                                                                                                         
 snapshot_t1                                     test_schema1:t1 (2021-07-09 16:28:03 +0800)                                                                                                   
1 row(s)
Took 0.5443 seconds                                                                                                                                                                            
=> ["snapshot_t1"]

# 删除快照
hbase(main):209:0> delete_snapshot 'snapshot_t1'

注意：0.94.x版本之前是不支持snapshot快照命令的。

9.数据查询：

请参考我的另一篇文章：hbase数据查询及过滤器详细使用

10.delete 删除数据：

（1）删除指定行中指定列：

语法：delete <table>, <rowkey>, <family:column> , <timestamp>(必须指定列名，删除其所有版本数据)

delete 'scores','zhangsan01','course:math'

（2）删除整行数据（可不指定列名）：

语法：deleteall <table>, <rowkey>, <family:column> , <timestamp>

deleteall 'scores','zhangsan02'

注：Put，Delete，Get，Scan这四个类都是org.apache.hadoop.hbase.client的子类，可以到官网API去查看详细信息。

（3）批量删除多行数据

目前是没有找到这样的命令，只能配合 shll 脚本进行批量删除。参考：【hbase】按时间段批量删除hbase数据

touch record.txt
touch delete.sh

echo "scan 'heheda',{STARTROW=>'haha_1649088000',STOPROW=>'haha_1649174399', COLUMNS => 'DATA:qie'}" | hbase shell > record.txt

echo '#!/bin/bash ' >> delete.sh
echo "exec hbase shell <<EOF " >> delete.sh
cat record.txt | awk '{print "deleteall '\'heheda\''", ",", "'\''"$1"'\''"}' >> delete.sh
echo "EOF " >> delete.sh
sh delete.sh

Java Api：

    /**
     * @Description: 根据 rowKey 批量删除数据
     * @param properties
     * @param tableName
     * @param tableName
     */
    public static void deleteDataBatch(Dataset<Row> dataDataset, Properties properties, String tableName) {
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name=TableName.valueOf(tableName);
        Table table = null;
        try {
            table = connection.getTable(name);
        } catch (IOException e) {
            e.printStackTrace();
        }

        JavaRDD<Row> dataRDD = dataDataset.toJavaRDD();
        dataRDD.foreachPartition((VoidFunction<Iterator<Row>>) rowIterator -> {
            while (rowIterator.hasNext()) {
                Row next = rowIterator.next();
                String studentId = next.getAs("student_id");
                String num = next.getAs("num");
                String rowKey = studentId + "_" + num;
                System.out.println("rowKey-->"+rowKey);
                deletes.add(new Delete(Bytes.toBytes(rowKey)));
            }
        });

        try {
            table.delete(deletes);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

11.count统计表中记录数：

# 每100条显示一次，缓存区为500
count 'scores', {INTERVAL => 100, CACHE => 500}

自己写java 实现：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.util.StopWatch;

import java.io.IOException;
import java.util.Properties;
import java.util.concurrent.TimeUnit;

public void rowCountByScanFilter(String tablename){
    long rowCount = 0;
    try {
        //计时
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();

        TableName name=TableName.valueOf(tablename);
        //connection为类静态变量
        Table table = connection.getTable(name);
        Scan scan = new Scan();
        //FirstKeyOnlyFilter只会取得每行数据的第一个kv，提高count速度
        scan.setFilter(new FirstKeyOnlyFilter());
        
        ResultScanner rs = table.getScanner(scan);
        for (Result result : rs) {
            rowCount += result.size();
        }

        stopWatch.stop();
        System.out.println("RowCount: " + rowCount);
        System.out.println("统计耗时：" + stopWatch.now(TimeUnit.SECONDS));
    } catch (Throwable e) {
        e.printStackTrace();
    }
}

注：其实官方已经封装好了相应的方法了：hbase org.apache.hadoop.hbase.mapreduce.RowCounter '表名'。更深入的研究可参考我的另一篇文章：Hbase进行RowCount统计；也可以参考其他人的文章：Hbase查询表大小的4个方式

12.清空表：

truncate 'scores'

13.修改表结构：

先disable后enable

# 例如：修改表scores的cf的TTL为180天
hbase(main):017:0> disable 'scores'
hbase(main):018:0> alter 'scores',{NAME=>'grade',TTL=>'15552000'},{NAME=>'course', TTL=>'15552000'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.2200 seconds

#改变多版本号：
hbase(main):019:0> alter 'scores',{NAME=>'grade',VERSIONS=>3}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 2.4020 seconds
注：网上都说修改表结构必须先先disable后enable，但是我没有做这个操作，直接alter也成功了啊，不知道这样做有没有什么影响，目前还不太了解。

# 增加列族：
hbase(main):020:0> alter 'scores', NAME=>'info'
# 删除列族：
alter 'scores', NAME=> 'info', METHOD => 'delete'
# 或者
alter 'scores', 'delete' => 'info'

hbase(main):020:0> enable 'scores'

14.查看HBase表的创建时间：

hbase:meta表会记录元数据信息，而这些数据在创建时也会有timestamp属性。rowkey就是表名（格式是namespace:table）看一下查到的数据的时间戳，然后把时间戳转为时间串。
在这里插入图片描述
此外，也可以到zookeeper中查看相关信息，使用get /hbase/table/表名（格式是namespace:table）查询到的ctime属性就是创建时间了。

15.获取分区信息：

scan 'hbase:meta',{FILTER=>"PrefixFilter('table_name')"}

info:regioninfo 此限定符包含 STARTKEY 和 ENDKEY。
info:server 此限定符包含 region 服务器的信息

来自：hbase 获取分区信息 shell

16.删除表：

hbase(main):044:0> disable 't2'                                                                                                                                                              
hbase(main):045:0> drop 't2'

17.表操作权限：

（1）分配权限：

grant 'hadoop','RW','scores'    #分配给用户hadoop表scores的读写权限

注意：一开始我分配权限的时候总是报错：

hbase(main):038:0> grant 'hadoop','RW','scores'

ERROR: DISABLED: Security features are not available

解决：

[hadoop@h71 ~]$ vi hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml
添加：
<property>
  <name>hbase.superuser</name>
  <value>root,hadoop</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.coprocessor.master.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.rpc.engine</name>
  <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
<property>
  <name>hbase.security.authorization</name>
  <value>true</value>
</property>

同步hbase配置（我的hbase集群为h71（主），h72（从），h73（从））：

[hadoop@h71 ~]$ cat /home/hadoop/hbase-1.0.0-cdh5.5.2/conf/regionservers|xargs -i -t scp /home/hadoop/hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml hadoop@{}:/home/hadoop/hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml
scp /home/hadoop/hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml hadoop@h72:/home/hadoop/hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml 
hbase-site.xml                                                                                                                                                                                             100% 2038     2.0KB/s   00:00    
scp /home/hadoop/hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml hadoop@h73:/home/hadoop/hbase-1.0.0-cdh5.5.2/conf/hbase-site.xml 
hbase-site.xml                                                                                                                                                                                             100% 2038     2.0KB/s   00:00

重启hbase集群。

注：
HBase提供的五个权限标识符：RWXCA，分别对应着READ(‘R’)、WRITE(‘W’)、EXEC(‘X’)、CREATE(‘C’)、ADMIN(‘A’)

HBase提供的安全管控级别包括：

Superuser：拥有所有权限的超级管理员用户。通过hbase.superuser参数配置
Global：全局权限可以作用在集群所有的表上。
Namespace ：命名空间级。
Table：表级。
ColumnFamily：列簇级权限。
Cell：单元级。

和关系数据库一样，权限的授予和回收都使用grant和revoke，但格式有所不同。grant语法格式：grant user permissions table column_family column_qualifier

（2）查看权限：

hbase(main):010:0> user_permission 'scores'
User                                                         Namespace,Table,Family,Qualifier:Permission                                                                                                                                     
 hadoop                                                      default,scores,,: [Permission: actions=READ,WRITE]                                                                                                                              
1 row(s) in 0.2530 seconds

（3）收回权限：

hbase(main):006:0> revoke 'hadoop','scores'

18.hbase shell脚本：

既然是shell命令，当然也可以把所有的hbase shell命令写入到一个文件内，想Linux shell脚本程序那样去顺序的执行所有命令。如同写linux shell，把所有hbase shell命令书写在一个文件内，然后执行如下命令即可：

[hadoop@h71 hbase-1.0.0-cdh5.5.2]$ vi hehe.txt（这个文件名随便起，正规点的话可以起test.hbaseshell）
create 'hui','cf'
list
disable 'hui'
drop 'hui'
list
[hadoop@h71 hbase-1.0.0-cdh5.5.2]$ bin/hbase shell hehe.txt

19.跨集群数据迁移：

参考：HBase 四种数据迁移方案

（1）Export/Import方式：

迁移原集群的表：

在这里插入图片描述
步骤一：在目标集群创建对应表

在这里插入图片描述

步骤二：Export阶段：将原集群表数据Scan并转换成Sequence File到Hdfs上，因Export也是依赖于MR的，如果用到独立的MR集群的话，只要保证在MR集群上关于HBase的配置和原集群一样且能和原集群策略打通，就可直接用Export命令。若需要同步多个版本数据，可以指定versions参数，否则默认同步最新版本的数据，还可以指定数据起始结束时间，使用如下：

# output_hdfs_path可以直接是目标集群的hdfs路径，也可以是原集群的HDFS路径，如果需要指定版本号，起始结束时间
hbase org.apache.hadoop.hbase.mapreduce.Export <tableName> <ouput_hdfs_path> <versions> <starttime> <endtime> 

# 实操：
[root@node01 ~]# hbase org.apache.hadoop.hbase.mapreduce.Export test_schema1:t2 /huiq 99999
# 注意：执行该命令前/huiq目录不能存在

可选参数：

-Dmapred.job.queue.name=root.default

步骤三：Import阶段：将原集群Export出的SequenceFile导到目标集群对应表，使用如下：

# 如果原数据是存在原集群HDFS，此处input_hdfs_path可以是原集群的HDFS路径，如果原数据存在目标集群HDFS，则为目标集群的HDFS路径
hbase org.apache.hadoop.hbase.mapreduce.Import <tableName> <input_hdfs_path>

# 实操：
[hdfs@bigdatanode01 ~]$ hbase org.apache.hadoop.hbase.mapreduce.Import test_schema1:t2 hdfs://192.110.110.110:8020/huiq

注意：在执行步骤三的时候可能报错

在这里插入图片描述

解决：切换到hdfs用户（su - hdfs）再执行Import命令即可

进行有条件的导出操作：
来自：hbase的 export以及import工具使用示例 + 时间区间+ key前缀

先看看 hbase export 的使用说明：

[root@heheda ~]# hbase org.apache.hadoop.hbase.mapreduce.Export -help
ERROR: Wrong number of arguments: 1
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

  Note: -D properties will be applied to the conf used. 
  For example: 
   -D mapreduce.output.fileoutputformat.compress=true
   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
   -D mapreduce.output.fileoutputformat.compress.type=BLOCK
  Additionally, the following SCAN properties can be specified
  to control/limit what is exported..
   -D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
   -D hbase.mapreduce.include.deleted.rows=true
   -D hbase.mapreduce.scan.row.start=<ROWSTART>
   -D hbase.mapreduce.scan.row.stop=<ROWSTOP>
   -D hbase.client.scanner.caching=100
   -D hbase.export.visibility.labels=<labels>
For tables with very wide rows consider setting the batch size as below:
   -D hbase.export.scanner.batch=10
   -D hbase.export.scanner.caching=100
   -D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
For MR performance consider the following properties:
   -D mapreduce.map.speculative=false
   -D mapreduce.reduce.speculative=false

下面是导出 user 表中 version=1,start_time=0, end_time=99999999999 key的prefix=row222的用户。

[hdfs@test-hadoop-slave ~]$ hbase org.apache.hadoop.hbase.mapreduce.Export 'users'  /test/source/fromhbasetohdfs/users 1 0 999999999999999 '^^(?!row222)'

（5）HBase的ImportTSV工具（ bulk load ）：

来自：大数据基础2.6 : HBase的数据导入、导出

前面使用的Import和Export工具，只能在HBase内部完成闭环，也就是导入和导出使用的文件，是特殊格式，只能用于HBase表。但是，我们经常会有需求，将常见的CSV文件，导入到HBase中。这个工具是HBase内置的，ImportTSV工具：https://hbase.apache.org/book.html#importtsv.options

使用方式：

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,C1:code,C2:monry,C3:xxx,C4:yyy table_name HDFS路径

-Dimporttsv.separator 用来指定分隔符
-Dimporttsv.columns 用来指定列的匹配，按照顺序，如果某一个CSV的列要作为Rowkey使用，那么填入HBASE_ROW_KEY占位即可

比如：
-Dimporttsv.columns=HBASE_ROW_KEY,C1:code,C2:money,C3:xxx,C4:yyy
表示：
- CSV的第1个列：是rowkey
- CSV的第2个列：是C1列族的code二级列
- CSV的第3个列：是C2列族的money列
......

如果，假设，第二个列是rowkey可以：
-Dimporttsv.columns=C1:code,HBASE_ROW_KEY,C2:money,C3:xxx,C4:yyy

注：既然可以导入 CSV 文件，那么导出为 CSV 文件不也应该有吗，但事实是没发现有，在网上搜索了半天也没找到（如果有的话大家可以告一下），有一次公司需要将 Hbase 中十多天的数据导出为 CSV 文件，结果一时半会儿没有又快又好的方法，要不就是先查出数据再手动插入到 CSV 文件中，要不就是自己编写代码实现。

20.hbase hbck：

参考：hbase hbck

hbase hbck是hbase自带的一项非常实用的工具，很多hbase中出现的问题都可以尝试用hbase hbck修复。
hbck 是一个检查和修复表，region一致性和完整性的工具。新版本的hbck从 hdfs目录、META、RegionServer 这三处获得region的Table和Region的相关信息，根据这些信息判断并尝试进行repair。
新版本的 hbck 可以修复各种错误，修复选项是：（请注意选项后面是否需要加具体表名）

（1）-fix
    向下兼容用，被-fixAssignments替代   
（2）-fixAssignments
    用于修复region assignments错误   
（3）-fixMeta
    用于修复meta表的问题，前提是HDFS上面的region info信息有并且正确。   
（4）-fixHdfsHoles
    修复region holes（空洞，某个区间没有region）问题   
（5）-fixHdfsOrphans
    修复Orphan region（hdfs上面没有.regioninfo的region）   
（6）-fixHdfsOverlaps
    修复region overlaps（区间重叠）问题   
（7）-fixVersionFile
    修复缺失hbase.version文件的问题   
（8）-maxMerge <n> （n默认是5）
    当region有重叠是，需要合并region，一次合并的region数最大不超过这个值。   
（9）-sidelineBigOverlaps 
    当修复region overlaps问题时，允许跟其他region重叠次数最多的一些region不参与（修复后，可以把没有参与的数据通过bulk load加载到相应的region）   
（10）-maxOverlapsToSideline <n> （n默认是2）
    当修复region overlaps问题时，一组里最多允许多少个region不参与。由于选项较多，所以有两个简写的选项   
（11）-repair
    相当于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps。如前所述，-repair 打开所有的修复选项，相当于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps   
（12）-repairHoles
    相当于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans  
 
示例情景：  
 
Q：缺失hbase.version文件   
A：加上选项 -fixVersionFile 解决 
  
Q：如果一个region即不在META表中，又不在hdfs上面，但是在regionserver的online region集合中   
A：加上选项 -fixAssignments 解决  
 
Q：如果一个region在META表中，并且在regionserver的online region集合中，但是在hdfs上面没有   
A：加上选项 -fixAssignments -fixMeta 解决，（ -fixAssignments告诉regionserver close region），（ -fixMeta删除META表中region的记录） 
 
Q：如果一个region在META表中没有记录，没有被regionserver服务，但是在hdfs上面有   
A：加上选项 -fixMeta -fixAssignments 解决，（ -fixAssignments 用于assign region），（ -fixMeta用于在META表中添加region的记录）   
 
Q：如果一个region在META表中没有记录，在hdfs上面有，被regionserver服务了   
A：加上选项 -fixMeta 解决，在META表中添加这个region的记录，先undeploy region，后assign。-fixMeta，如果hdfs上面没有，那么从META表中删除相应的记录，如果hdfs上面有，在META表中添加上相应的记录信息 
   
Q：如果一个region在META表中有记录，但是在hdfs上面没有，并且没有被regionserver服务   
A：加上选项 -fixMeta 解决，删除META表中的记录   
 
Q：如果一个region在META表中有记录，在hdfs上面也有，table不是disabled的，但是这个region没有被服务   
A：加上选项 -fixAssignments 解决，assign这个region。-fixAssignments，用于修复region没有assign、不应该assign、assign了多次的问题   
 
Q：如果一个region在META表中有记录，在hdfs上面也有，table是disabled的，但是这个region被某个regionserver服务了   
A：加上选项 -fixAssignments 解决，undeploy这个region  
 
Q：如果一个region在META表中有记录，在hdfs上面也有，table不是disabled的，但是这个region被多个regionserver服务了   
A：加上选项 -fixAssignments 解决，通知所有regionserver close region，然后assign region 
  
Q：如果一个region在META表中，在hdfs上面也有，也应该被服务，但是META表中记录的regionserver和实际所在的regionserver不相符   
A：加上选项 -fixAssignments 解决   
   
Q：region holes   
A：加上 -fixHdfsHoles ，创建一个新的空region，填补空洞，但是不assign 这个 region，也不在META表中添加这个region的相关信息。修复region holes时，-fixHdfsHoles 选项只是创建了一个新的空region，填补上了这个区间，还需要加上-fixAssignments -fixMeta 来解决问题，（ -fixAssignments 用于assign region），（ -fixMeta用于在META表中添加region的记录），所以有了组合拳 -repairHoles 修复region holes，相当于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans 
 
Q：region在hdfs上面没有.regioninfo文件   
A：加上选项 -fixHdfsOrphans 解决   
 
Q：region overlaps   
A：需要加上 -fixHdfsOverlaps

该命令输出如下：

[root@node01 spark2]# hbase hbck
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/phoenix/phoenix-5.0.0.3.1.4.0-315-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2021-07-23 09:25:41,838 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=node01:2181,node02:2181,node03:2181
2021-07-23 09:25:41,850 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-315--1, built on 08/23/2019 05:02 GMT
2021-07-23 09:25:41,850 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=node01
2021-07-23 09:25:41,850 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_231
2021-07-23 09:25:41,850 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2021-07-23 09:25:41,850 INFO  [main] zookeeper.ZooKeeper: Client environment:java.home=/opt/tools/jdk1.8.0_231/jre
2021-07-23 09:25:41,851 INFO  [main] zookeeper.ZooKeeper: Client environment:java.class.path=/etc/hbase/conf:/opt/tools/jdk1.8.0_231/lib/tools.jar:/usr/hdp/3.1.4.0-315/hbase:/usr/hdp/3.1.4.0-315/hbase/lib/animal-sniffer-annotations-1.17.jar:/usr/hdp/3.1.4.0-315/hbase/lib/aopalliance-1.0.jar:/usr/hdp/3.1.4.0-315/hbase/lib/aopalliance-repackaged-2.5.0-b32.jar:/usr/hdp/3.1.4.0-315/hbase/lib/atlas-plugin-classloader-1.1.0.3.1.4.0-315.jar:/usr/hdp/3.1.4.0-315/hbase/lib/audience-annotations-0.5.0.jar:/usr/hdp/3.1.4.0-315/hbase/lib/avro-1.7.7.jar:/usr/hdp/3.1.4.0-315/hbase/lib/aws-java-sdk-bundle-1.11.375.jar:/usr/hdp/3.1.4.0-315/hbase/lib/checker-qual-2.8.1.jar:/usr/hdp/3.1.4。。。。。。。（这里依赖太多就省略了）
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/3.1.4.0-315/hadoop/lib/native/Linux-amd64-64:/usr/hdp/3.1.4.0-315/hadoop/lib/native/Linux-amd64-64:/usr/hdp/3.1.4.0-315/hadoop/lib/native
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-1160.11.1.el7.x86_64
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=root
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/root
2021-07-23 09:25:41,882 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/usr/hdp/3.1.4.0-315/spark2
2021-07-23 09:25:41,885 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=node01:2181,node02:2181,node03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@604c5de8
HBaseFsck command line options: 
2021-07-23 09:25:41,913 INFO  [main] util.HBaseFsck: Launching hbck
2021-07-23 09:25:41,916 INFO  [main-SendThread(node01:2181)] zookeeper.ClientCnxn: Opening socket connection to server node01/10.3.2.24:2181. Will not attempt to authenticate using SASL (unknown error)
2021-07-23 09:25:41,925 INFO  [main-SendThread(node01:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.3.2.24:58562, server: node01/10.3.2.24:2181
2021-07-23 09:25:41,966 INFO  [main-SendThread(node01:2181)] zookeeper.ClientCnxn: Session establishment complete on server node01/10.3.2.24:2181, sessionid = 0x17aadd367a40a27, negotiated timeout = 60000
2021-07-23 09:25:41,995 INFO  [main] zookeeper.ReadOnlyZKClient: Connect 0x4a11eb84 to node01:2181,node02:2181,node03:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms
2021-07-23 09:25:42,000 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x4a11eb84] zookeeper.ZooKeeper: Initiating client connection, connectString=node01:2181,node02:2181,node03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$14/781735981@6b9ac39a
2021-07-23 09:25:42,002 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x4a11eb84-SendThread(node03:2181)] zookeeper.ClientCnxn: Opening socket connection to server node03/10.3.2.26:2181. Will not attempt to authenticate using SASL (unknown error)
2021-07-23 09:25:42,004 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x4a11eb84-SendThread(node03:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.3.2.24:49356, server: node03/10.3.2.26:2181
2021-07-23 09:25:42,051 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x4a11eb84-SendThread(node03:2181)] zookeeper.ClientCnxn: Session establishment complete on server node03/10.3.2.26:2181, sessionid = 0x37aad46c7b71234, negotiated timeout = 60000
Version: 2.0.2.3.1.4.0-315
2021-07-23 09:25:42,797 INFO  [main] util.HBaseFsck: Computing mapping of all store files
.
2021-07-23 09:25:43,501 INFO  [main] util.HBaseFsck: Validating mapping using HDFS state
2021-07-23 09:25:43,502 INFO  [main] util.HBaseFsck: Computing mapping of all link files
.
2021-07-23 09:25:43,691 INFO  [main] util.HBaseFsck: Validating mapping using HDFS state
Number of live region servers: 1
Number of dead region servers: 1
Master: node01,16000,1626086603004
Number of backup masters: 2
Average load: 50.0
Number of requests: 207161
Number of regions: 50
Number of regions in transition: 0
2021-07-23 09:25:44,100 INFO  [main] util.HBaseFsck: Loading regionsinfo from the hbase:meta table

Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2021-07-23 09:25:44,226 INFO  [main] util.HBaseFsck: getTableDescriptors == tableNames => [SYSTEM.FUNCTION, hbase_test, suntest:t2, USER, suntest:t1, SYSTEM.LOG, atlas_janus, kylin_metadata, test_schema1:t2, SYSTEM.STATS, SUNTEST.USER, hbase:namespace, test_schema1:t1, KYLIN_6OMP0DMLFQ, SYSTEM.CATALOG, ATLAS_ENTITY_AUDIT_EVENTS, SYSTEM.MUTEX, SYSTEM.SEQUENCE]
2021-07-23 09:25:44,228 INFO  [main] zookeeper.ReadOnlyZKClient: Connect 0x1ddd3478 to node01:2181,node02:2181,node03:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms
2021-07-23 09:25:44,229 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x1ddd3478] zookeeper.ZooKeeper: Initiating client connection, connectString=node01:2181,node02:2181,node03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$14/781735981@6b9ac39a
2021-07-23 09:25:44,230 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x1ddd3478-SendThread(node02:2181)] zookeeper.ClientCnxn: Opening socket connection to server node02/10.3.2.25:2181. Will not attempt to authenticate using SASL (unknown error)
2021-07-23 09:25:44,232 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x1ddd3478-SendThread(node02:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.3.2.24:43240, server: node02/10.3.2.25:2181
2021-07-23 09:25:44,273 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x1ddd3478-SendThread(node02:2181)] zookeeper.ClientCnxn: Session establishment complete on server node02/10.3.2.25:2181, sessionid = 0x27aa2b0324412c8, negotiated timeout = 60000
2021-07-23 09:25:44,444 INFO  [main] client.ConnectionImplementation: Closing master protocol: MasterService
2021-07-23 09:25:44,445 INFO  [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x1ddd3478 to node01:2181,node02:2181,node03:2181
Number of Tables: 18
2021-07-23 09:25:44,455 INFO  [main] util.HBaseFsck: Loading region directories from HDFS
2021-07-23 09:25:44,472 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x1ddd3478] zookeeper.ZooKeeper: Session: 0x27aa2b0324412c8 closed
2021-07-23 09:25:44,473 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x1ddd3478-EventThread] zookeeper.ClientCnxn: EventThread shut down
..
2021-07-23 09:25:44,696 INFO  [main] util.HBaseFsck: Loading region information from HDFS
.
2021-07-23 09:25:47,212 INFO  [main] util.HBaseFsck: Checking and fixing region consistency
2021-07-23 09:25:47,286 INFO  [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
Summary:
Table test_schema1:t1 is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table test_schema1:t2 is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table SUNTEST.USER is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table ATLAS_ENTITY_AUDIT_EVENTS is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table SYSTEM.CATALOG is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table USER is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table SYSTEM.SEQUENCE is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table SYSTEM.LOG is okay.
    Number of regions: 32
    Deployed on:  node03,16020,1625705330661
Table SYSTEM.FUNCTION is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table SYSTEM.MUTEX is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
Table SYSTEM.STATS is okay.
    Number of regions: 1
    Deployed on:  node03,16020,1625705330661
0 inconsistencies detected.
Status: OK
2021-07-23 09:25:47,557 INFO  [main] zookeeper.ZooKeeper: Session: 0x17aadd367a40a27 closed
2021-07-23 09:25:47,557 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-07-23 09:25:47,557 INFO  [main] client.ConnectionImplementation: Closing master protocol: MasterService
2021-07-23 09:25:47,558 INFO  [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x4a11eb84 to node01:2181,node02:2181,node03:2181
2021-07-23 09:25:47,597 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x4a11eb84] zookeeper.ZooKeeper: Session: 0x37aad46c7b71234 closed
2021-07-23 09:25:47,597 INFO  [ReadOnlyZKClient-node01:2181,node02:2181,node03:2181@0x4a11eb84-EventThread] zookeeper.ClientCnxn: EventThread shut down

注：目前该工具好像升级了，可参考：技术篇-HBase 2.0 之修复工具 HBCK2 运维指南这篇文章来源于：技术篇-HBase 2.0 之修复工具 HBCK2 运维指南

21.hbase中文内容乱码解决：

方法一：shell中使用toString

参考：如何在 HBase Shell 命令行正常查看十六进制编码的中文

hbase(main):050:0> scan 'test'
ROW                                              COLUMN+CELL
 row-1                                           column=f:c1, timestamp=1587984555307, value=\xE7\xA6\x85\xE5\x85\x8B
 row-2                                           column=f:c2, timestamp=1587984555307, value=HBase\xE8\x80\x81\xE5\xBA\x97
 row-3                                           column=f:c3, timestamp=1587984555307, value=HBase\xE5\xB7\xA5\xE4\xBD\x9C\xE7\xAC\x94\xE8\xAE\xB0
 row-4                                           column=f:c4, timestamp=1587984555307, value=\xE6\x88\x91\xE7\x88\xB1\xE4\xBD\xA0\xE4\xB8\xAD\xE5\x9B\xBD\xEF\xBC\x81
4 row(s) in 0.0190 seconds

hbase(main):051:0> scan 'test', {FORMATTER => 'toString'}
ROW                                              COLUMN+CELL
 row-1                                           column=f:c1, timestamp=1587984555307, value=禅克
 row-2                                           column=f:c2, timestamp=1587984555307, value=HBase老店
 row-3                                           column=f:c3, timestamp=1587984555307, value=HBase工作笔记
 row-4                                           column=f:c4, timestamp=1587984555307, value=我爱你中国！
4 row(s) in 0.0170 seconds

hbase(main):052:0> scan 'test', {FORMATTER => 'toString',LIMIT=>1,COLUMN=>'f:c4'}
ROW                                              COLUMN+CELL
 row-4                                           column=f:c4, timestamp=1587984555307, value=我爱你中国！
1 row(s) in 0.0180 seconds

hbase(main):053:0> scan 'test', {FORMATTER_CLASS => 'org.apache.hadoop.hbase.util.Bytes', FORMATTER => 'toString'}
ROW                                              COLUMN+CELL
 row-1                                           column=f:c1, timestamp=1587984555307, value=禅克
 row-2                                           column=f:c2, timestamp=1587984555307, value=HBase老店
 row-3                                           column=f:c3, timestamp=1587984555307, value=HBase工作笔记
 row-4                                           column=f:c4, timestamp=1587984555307, value=我爱你中国！
4 row(s) in 0.0220 seconds

hbase(main):054:0> scan 'test', {FORMATTER_CLASS => 'org.apache.hadoop.hbase.util.Bytes', FORMATTER => 'toString', COLUMN=>'f:c4'}
ROW                                              COLUMN+CELL
 row-4                                           column=f:c4, timestamp=1587984555307, value=我爱你中国！
1 row(s) in 0.0220 seconds

hbase(main):004:0> scan 'test', {COLUMNS => ['f:c1:toString','f:c2:toString'] }
ROW                                              COLUMN+CELL
 row-1                                           column=f:c1, timestamp=1587984555307, value=禅克
 row-2                                           column=f:c2, timestamp=1587984555307, value=HBase老店
2 row(s) in 0.0180 seconds

hbase(main):003:0> scan 'test', {COLUMNS => ['f:c1:c(org.apache.hadoop.hbase.util.Bytes).toString','f:c3:c(org.apache.hadoop.hbase.util.Bytes).toString'] }
ROW                                              COLUMN+CELL
 row-1                                           column=f:c1, timestamp=1587984555307, value=禅克
 row-3                                           column=f:c3, timestamp=1587984555307, value=HBase工作笔记
2 row(s) in 0.0160 seconds

hbase(main):055:0> scan 'test', {COLUMNS => ['f:c1:toString','f:c4:c(org.apache.hadoop.hbase.util.Bytes).toString'] }
ROW                                              COLUMN+CELL
 row-1                                           column=f:c1, timestamp=1587984555307, value=禅克
 row-4                                           column=f:c4, timestamp=1587984555307, value=我爱你中国！
2 row(s) in 0.0290 seconds

hbase(main):058:0> get 'test','row-2','f:c2:toString'
COLUMN                                           CELL
 f:c2                                            timestamp=1587984555307, value=Get到了吗？好意思不帮我分享嘛~哈哈~
1 row(s) in 0.0070 seconds

方法二：使用hive或者phoenix等第三方工具做映射表即可

方法三：用java代码转换

参考：hbase中文内容编码转换

import org.apache.commons.codec.binary.Hex;
import org.junit.Test;

public class HbaseTest {
    /**
     * HBASE中文转换
     */
    @Test
    public void testHbaseStr() throws Exception {
//        Hbase UTF8编码
        String content = "\\xE7\\x83\\xA6";
        char[] chars = content.toCharArray();
        StringBuffer sb = new StringBuffer();
        for (int i = 2; i < chars.length; i = i + 4) {
//            System.out.println(chars[i]);
            sb.append(chars[i]);
//            System.out.println(chars[i + 1]);
            sb.append(chars[i + 1]);
        }
        System.out.println(sb);
        String ouputStr = new String(Hex.decodeHex(sb.toString().toCharArray()), "UTF-8");
        System.out.println(ouputStr);
    }
}

在这里插入图片描述

22.查看表大小：

可查看 Hdfs 上存储的数据大小间接得到表的大小：hdfs dfs -du -h /hbase/data/default/

困惑：在网上有搜到说在 hbase shell 里可以执行 size、get_region_info 这些命令来查看情况，查到的文章如：hbase查看表占用空间大小、hbase size 方法、hbase如何看表大小，但我却无法执行报错，在 help 里也没搜到这些命令：

hbase(main):080:0> size 'heheda'
NoMethodError: undefined method `size' for main:Object

hbase(main):084:0> get_region_info 'heheda','haha_1705282403161'
NoMethodError: undefined method `get_region_info' for main:Object

23.生存时间 TTL（Time To Live）：

公司 hbase 集群的 hdfs 使用量已经超过 80%，检查发现一个表数据量特别巨大，该表会记录用户每天的一次活动属性，按照4亿用户*197天，有800亿条的数据存放在表中有4TB大小，对于一个表来说过于大了。有两个问题：1、未开启压缩；2、没设置TTL。经过和业务方讨论，只保留最近93天（3个月）的数据，然后开启LZO压缩。

理论上所有的表都应该开启压缩，但是早期使用时没对业务方进行限制，导致现在有些表没开启压缩，而数据量又特别大，所以考虑在线开启压缩和 TTL。

考虑 hbase 写模型，压缩发生在 HFile 文件需要写HDFS的过程，这个过程有3种，第一：flush、第二：split、第三：compact。而对于已经存在的数据，应该只能在 compact 阶段进行。compact 的原理是读 region 中已有的 HFile，然后写新 HFile，理论上应该能保证新 HFile 写的时候是压缩的。

compact 过程分为普通 compact 和 majorcompact，普通 compact 就是N个文件合一个，N可以设置；majorcompact 就是 region 下所有的文件合1个。先读取原有文件，然后写新 HFile。来自：hbase TTL设置指定天 hbase的ttl，实测可参考文章：Hbase之TTL

Cell TTL（单元格 TTL）处理和 ColumnFamily TTL（列族 TTL）之间存在两个显着差异：HBase：列族TTL和单元格TTL

Cell TTL 以毫秒而不是秒为单位表示。
Cell TTL 的 TTL 不能超过 ColumnFamily 的 TTL。

为 hbase 表数据指定过期时间，达到过期时间后，compaction 时自动删除过期数据。来自：hbase设置表的TTL值

通常 Hbase 表默认 TTL 为 FOREVER 即永不过期，或者你可以指定一个 ColumnFamily 的 TTL（单位秒）值
修改表结构命令有两个 alter、alter_async，异步方式还可通过 alter_status 查看进度。通常选择异步方式，下边也以 alter_async 为例。
修改线上业务表时注意，修改表结构是有损的，修改表的过程 region 需要关闭、重新打开，所以修改过程可能会有NotServingRegionException

-- 设置、调大或调小TTL
-- 列族
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '秒数'}
-- 列
alter_async 'TABLE_NAME',{NAME => 'f:a',TTL => '秒数'}

-- 恢复TTL为永久，其值不可以使用FOREVER或-1
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}

注：网上大多说要先关闭表（describe tableName）最后再执行 major_compact 操作，但我在实测中没有这两个操作直接修改表结构发现也可以成功。

其他参考文章：
HBase删除数据的原理
 HBase的TTL介绍
 HBase知识点总结

24.缓存和压缩技术：

（1）缓存技术

内存缓存：HBase使用内存缓存来提高读取性能。内存缓存分为两个层级：块缓存(Block Cache)和行缓存(Row Cache)。

块缓存：块缓存是HBase默认启用的缓存层级，它将HFile文件中的块(默认为64KB)缓存在内存中。当进行读取操作时，HBase首先检查块缓存，如果数据在块缓存中，则直接返回结果，避免了磁盘IO操作，从而大大提高了读取性能。块缓存的容量是有限的，可以通过配置 hbase.regionserver.global.memstore.block.multiplier 参数来调整。默认情况下，块缓存的容量为堆内存的40%。
行缓存：行缓存是可选的缓存层级，它将HBase表中的行缓存在内存中。当进行读取操作时，HBase首先检查行缓存，如果数据在行缓存中，则直接返回结果，避免了块缓存和磁盘IO操作，进一步提高了读取性能。行缓存的容量是有限的，可以通过在表的描述中设置 CACHE_DATA 属性来启用行缓存，并通过 CACHE_DATA_BLOCK_ON_WRITE 属性来控制是否在写入时缓存数据。

本地缓存：HBase还提供了本地缓存(Local Cache)来减少网络传输的开销。本地缓存是在客户端应用程序中维护的，它将最近访问的数据缓存在本地内存中。当应用程序需要读取数据时，首先检查本地缓存，如果数据在本地缓存中，则直接返回结果，避免了网络传输的开销。本地缓存的容量是有限的，可以通过合理设置缓存的大小来平衡内存消耗和性能提升。

来自：HBase缓存和压缩技术

（2）压缩技术

在HBase中，数据压缩主要通过存储层的压缩技术实现。HBase支持多种压缩算法，如Gzip、LZO、Snappy等。压缩算法是一种将多个数据块映射到较小数据块的技术，可以减少存储空间和提高I/O性能。HBase的压缩算法可以在存储层和传输层应用，实现不同的效果。

检查当前 HBase 是否支持压缩：hbase org.apache.hadoop.util.NativeLibraryChecker。参考：第十五记·HBase压缩、HBase与Hive集成详解

在这里插入图片描述
可以使用 CompressionTest 工具来验证 snappy 的压缩器可用：

bin/hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://hadoop01.com:8020/test.txt snappy

我在 CDH-6.3.2-1 里输出的结果是这样的，和参考文章里的不太一样也不知道是否正常。

在这里插入图片描述

# 创建指定压缩格式的表
create 'ods:tablename',{NAME=>'info',COMPRESSION=>'Snappy'},{NAME=>'f2'}

GZ：用于冷数据压缩，与Snappy和LZ0相比，GZIP的压缩率更高，但是更消耗CPU，解压/压缩速度更慢。
Snappy和LZ0：用于热数据压缩，占用CPU少，解压/压缩速度比GZ快，但是压缩率不如GZ高。
Snappy与LZ0相比，Snappy整体性能优于LZ0，Snappy压缩率比LZ0更低，但是解压/压缩速度更快。
LZ4与LZ0相比，LZ4的压缩率和LZ0的压缩率相差不多，但是LZ4的解压/压缩速度更快。

多数情况下，选择Snppy或LZ0是比较好的选择，因为它们的压缩开销底，能节省空间。来自：hbase数据压缩

参考：
数据压缩：HBase数据压缩的技术和方法
两篇对优化方面写的好的文章：
HBase优化之路-合理的使用编码压缩
 hbase 压缩 hbase压缩方法

25.Region分裂（Split）、预分区

可查看我的另一篇文章：HBase Region分区、数据压缩及与Sqoop集成操作

26.查看帮助命令

hbase(main):092:0> help
HBase Shell, version 2.1.0-cdh6.3.2, rUnknown, Fri Nov  8 05:44:07 PST 2019
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: processlist, status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region
, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump
  Group name: replication
  Commands: add_peer, append_peer_exclude_namespaces, append_peer_exclude_tableCFs, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replica
ted_tables, remove_peer, remove_peer_exclude_namespaces, remove_peer_exclude_tableCFs, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config
  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: list_locks, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup
  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_
rsgroup
SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html

查看单个命令的帮助：

hbase(main):093:0> help "get_peer_config"
          Outputs the cluster key, replication endpoint class (if present), and any replication configuration parameters

hbase(main):094:0> help "list_peer_configs"
          No-argument method that outputs the replication peer configuration for each peer defined on this cluster.

hbase(main):098:0> help "list"
List all user tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

  hbase> list
  hbase> list 'abc.*'
  hbase> list 'ns:abc.*'
  hbase> list 'ns:.*'

参考：
hbase操作（shell 命令，如建表，清空表，增删改查）以及 hbase表存储结构和原理
 【甘道夫】HBase基本数据操作详解【完整版，绝对精品】
HBase 常用Shell命令
 HBase shell详情

小强签名设计

关注

0
点赞
踩
39

收藏

觉得还不错? 一键收藏
打赏
1
评论
hbase shell操作命令大全

一、hbase web操作访问地址 http://h71:60010h71的ip配置在$HBASE_HOME/conf/hbase-site.xml中ip映射成主机名在env/hosts中配置在windows系统中的C:\Windows\System32\drivers\etc目录下的hosts文件中配置）二、hbase shell 基本操作：
复制链接

扫一扫