HBASE整理

HBASE整理

一、HBASE由来

思考: HDFS主要适用于什么场景呢? 具有高的吞吐量 适合于批量数据的处理操作
    
    思考: 如果想在HDFS上, 直接读取HDFS上某一个文件中某一行数据, 请问是否可以办到呢?  
          或者说, 我们想直接修改HDFS上某一个文件中某一行数据,请问是否可以办到呢?
         
    HDFS并不支持对文件中数据进行随机的读写操作, 仅支持追加的方式来写入数据
    
    
    假设, 现在有一个场景: 数据量比较大, 需要对数据进行存储, 而且后续需要对数据进行随机读写的操作, 请问如何做呢? 
        此时HDFS并不合适了, 此时需要有一款软件能够帮助存储海量的数据, 并且支持高效的随机读写的特性, 此时HBase就是在这样的背景下产生了

在这里插入图片描述
HBase是采用java语言编写的一款 apache 开源的基于HDFS的nosql型数据库,不支持 SQL, 不支持事务, 不支持Join操作,没有表关系

既然是基于HDFS的, 那么也就意味HBase的数据最终是存在HDFS上, 在启动HBase集群之前, 必须要先启动HDFS

HBase仅支持三种数据读取方案:

1- 基于 rowkey(行键|主键)读取 
2- 基于 rowkey的range范围读取
3- 扫描全表数据

不支持事务, 仅支持单行事务

主要存储结构化数据以及半结构化的数据

HBase中数据存储都是以字节的形式来存储的

hbase易于扩展的

HBase的表具有三大特征:

1- 大: 在一个表中可以存储上十亿行的数据, 可以拥有上百万个列
2- 面向列: 是基于列族进行管理操作, 基于列族进行列式存储方案
3- 稀疏性: 在HBase中, 对于NULL值的数据, 不占用任何的磁盘空间的, 对效率也没有任何的影响, 所以表可以设计的非常稀疏

HBase的应用场景:

1- 数据量比较庞大的
2- 数据需要具备随机读写特性
3- 数据具有稀疏性特性

当以后工作中, 如果发现数据具备了以上二个及以上的特性的时候, 就可以尝试使用HBase来解决了

二、hbase和其他软件的区别

2.1 hbase和RDBMS的区别

HBase: 具有表, 存在rowkey, 分布式存储, 不支持SQL,不支持Join, 没有表关系, 不支持事务(仅支持单行事务)

MySQL(RDBMS): 具有表, 存在主键, 单机存储,支持SQL,支持Join, 存在表关系, 支持事务

2.2 hbase 和 HDFS的区别

HBase: 基于hadoop, 和 HDFS是一种强依赖关系, HBase的吞吐量不是特别高, 支持高效的随机读写特性

HDFS: 具有高的吞吐量, 适合于批量数据处理, 主要应用离线OLAP, 不支持随机读写

HBase是基于HDFS, 但是HDFS并不支持随机读写特性, 但是HBase却支持高效的随机读写特性, 两者貌似出现了一定的矛盾关系, 也就意味着HBase中必然做了一些特殊的处理工作

2.3 hbase和hive的区别

HBase: 基于HADOOP 是一个存储数据的nosql型数据库, 延迟性比较低, 适合于接入在线业务(实时业务)

HIVE: 基于HADOOP 是一个数据仓库的工具, 延迟性较高, 适用于离线的数据处理分析操作

HBase和hive都是基于hadoop的不同的软件, 两者之间可以共同使用, 可以使用hive集成HBase, 这样hive就可以读取hbase中数据, 从而实现统计分析操作

三、HBASE安装

3.1解压

[pxj@pxj62 /opt/app]$tar -zxvf hbase-2.1.0.tar.gz -C ../app/

3.2设置软连接

[pxj@pxj62 /opt/app]$ln -s hbase-2.1.0 hbase

3.3修改HBase配置文件

3.31hbase-env.sh

<configuration>
        <!-- HBase数据在HDFS中的存放的路径 -->
        <property>
            <name>hbase.rootdir</name>
            <value>hdfs://pxj62:8020/hbase</value>
        </property>
        <!-- Hbase的运行模式。false是单机模式,true是分布式模式。若为false,Hbase和Zookeeper会运行在同一个JVM里面 -->
        <property>
            <name>hbase.cluster.distributed</name>
            <value>true</value>
        </property>
        <!-- ZooKeeper的地址 -->
        <property>
            <name>hbase.zookeeper.quorum</name>
            <value>pxj62,pxj63,pxj64</value>
        </property>
        <!-- ZooKeeper快照的存储位置 -->
        <property>
            <name>hbase.zookeeper.property.dataDir</name>
            <value>/opt/app/zookeeper/zkdatas</value>
        </property>
        <!--  V2.1版本,在分布式情况下, 设置为false -->
        <property>
            <name>hbase.unsafe.stream.capability.enforce</name>
            <value>false</value>
        </property>

</configuration>

3.32hbase-env.sh

# 第28行

export JAVA_HOME=/export/server/jdk1.8.0_241/


 

# 第 125行

export HBASE_MANAGES_ZK=false

3.33 配置环境变量

[pxj@pxj62 /home/pxj]$vim .bashrc 
export HBASE_HOME=/opt/app/hbase
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${KAFKA_HOME}/bin:${KE_HOME}/bin:${HBASE_HOME}/bin:$PATH

[pxj@pxj62 /home/pxj]$source .bashrc 

3.34复制jar包

[pxj@pxj62 /opt/app/hbase/lib/client-facing-thirdparty]$cp htrace-core-3.1.0-incubating.jar /opt/app/hbase/lib/

3.35 修改regionservers文件

[pxj@pxj62 /opt/app/hbase/conf]$vim regionservers 
pxj62
pxj63
pxj64

3.36分发文件

[pxj@pxj62 /opt/app]$xsync hbase-2.1.0/

3.37启动HBASE

启动Hadoop
start-all.sh
启动zk
[pxj@pxj62 /home/pxj]$start-hbase.sh

3.38验证是否成功

http://pxj62:16010/master-status

在这里插入图片描述

[pxj@pxj62 /home/pxj]$hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app/hbase-2.1.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 2.1.0, re1673bb0bbfea21d6e5dba73e013b09b8b49b89b, Tue Jul 10 17:26:48 CST 2018
Took 0.0029 seconds                                                                                                                                                        
hbase(main):001:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load
Took 1.7279 seconds                                                                                                                                                        
hbase(main):002:0> 

四、HBASE模型

4.1rowkey : 行键

rowkey : 行键 , 理解为mysql中主键 , 只不过叫法不同而已
1) 在hbase中, rowkey的长度最长为64KB,但是在实际使用中, 一般长度在 0~100个字节, 常常的范围集中在 10~30区间
2) 在hbase中, 表中数据都是按照rowkey来进行排序, 不关心插入的顺序. 排序规则为 字典序的升序排列
        请将以下内容, 按照字典序的升序排序:  
            1 2 10 245 3 58 11 41 269 3478 154 
        排序结果为:
            1 10 11 154 2 245 269 3 3478 41 58 
        字典序规则: 
            先看第一位, 如果一致看第二位, 以此类推, 没有第二位的要比有第二位要小,其他位置也是一样的
3) 查询数据的方式, 主要有三种:
    基于rowkey的查询
    基于rowkey范围查询
    扫描全表数据
4) rowkey也是具备唯一性和非空性

4.2.column family: 列族(列簇)

1) 在一个表中, 是可以有多个列族的, 但是一般建议列族越少越好, 能用一个解决, 坚决不使用多个
2) 在hbase中, 都是基于列族的管理和存储的 (是一个列式的存储方案)
3) 一个列族下, 可以有多个列名 . 可以达到上百万个
4) 在创建表的时候, 必须制定表名 和 列族名

4.3.column qualifier: 列名(列限定符号)

1) 一个列名必然是属于某一个列族的, 在一个列族下是可以有多个列名的
2) 列名不需要在创建表的时候指定, 在插入数据的时候, 动态指定即可

4.4.timeStamp : 时间戳

每一个单元格背后都是具有时间戳的概念的, 默认情况下, 时间戳为插入数据的时间, 当然也可以自定义

4.5.versions: 版本号

1) 在hbase中, 对于每一个单元格, 都是可以记录其历史变更行为的, 通过设置version版本数量, 表示需要记录多少个历史版本, 默认值为 1

2) 当设置版本数量为多个的时候, 默认展示的离当前时间最近的版本的数据

4.6.cell : 单元格

如何确定一个唯一的单元格呢?  rowkey +  列族 + 列名 +

五、hbase的相关操作_shell命令

5.1hbase的基本shell操作

在三个节点任意一个节点的任意一个目录下, 执行:
hbase  shell
[pxj@pxj62 /opt/app/zookeeper]$hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app/hbase-2.1.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 2.1.0, re1673bb0bbfea21d6e5dba73e013b09b8b49b89b, Tue Jul 10 17:26:48 CST 2018
Took 0.0029 seconds             

5.2.查看整个集群的状态信息

hbase(main):001:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load
Took 0.5121 seconds          

5.3.如何查看帮助文档信息

hbase(main):002:0> help
HBase Shell, version 2.1.0, re1673bb0bbfea21d6e5dba73e013b09b8b49b89b, Tue Jul 10 17:26:48 CST 2018
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: processlist, status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: abort_procedure, list_locks, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup
  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase(main):003:0> help 'scan'
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP,
MAXLENGTH or COLUMNS, CACHE or RAW, VERSIONS, ALL_METRICS or METRICS

If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family'.

The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.

If you wish to see metrics regarding the execution of the scan, the
ALL_METRICS boolean should be set to true. Alternatively, if you would
prefer to see only a subset of the metrics, the METRICS array can be
defined to include the names of only the metrics you care about.

Some examples:

  hbase> scan 'hbase:meta'
  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804000, 1303668904000]}
  hbase> scan 't1', {REVERSED => true}
  hbase> scan 't1', {ALL_METRICS => true}
  hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}
  hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => "
    (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"}
  hbase> scan 't1', {FILTER =>
    org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
For setting the Operation Attributes
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default.  Example:

  hbase> scan 't1', {RAW => true, VERSIONS => 10}

Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column.  A user can define a FORMATTER by adding it to the column name in
the scan specification.  The FORMATTER can be stipulated:

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
  hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }

Note that you can specify a FORMATTER by column only (cf:qualifier). You can set a
formatter for all columns (including, all key parts) using the "FORMATTER"
and "FORMATTER_CLASS" options. The default "FORMATTER_CLASS" is
"org.apache.hadoop.hbase.util.Bytes".

  hbase> scan 't1', {FORMATTER => 'toString'}
  hbase> scan 't1', {FORMATTER_CLASS => 'org.apache.hadoop.hbase.util.Bytes', FORMATTER => 'toString'}

Scan can also be used directly from a table, by first getting a reference to a
table, like such:

  hbase> t = get_table 't'
  hbase> t.scan

Note in the above situation, you can still provide all the filtering, columns,
options, etc as described above.

4.5.如何查看当前hbase中有那些表呢?

hbase(main):005:0> list
TABLE                                                                                                                                                                      
0 row(s)
Took 0.0522 seconds                                                                                                                                                        
=> []

4.6.如何创建一张表

格式:
    create '表名','列族1','列族2' ....
    或者
    create '表名',{NAME=>'列族1'},{NAME=>'列族2'} ....
hbase(main):006:0> create 'test01','f1','f2'
Created table test01
Took 0.8959 seconds                                                                                                                                                        
=> Hbase::Table - test01
hbase(main):007:0> list
TABLE                                                                                                                                                                      
test01                                                                                                                                                                     
1 row(s)
Took 0.0266 seconds                                                                                                                                                        
=> ["test01"]
hbase(main):008:0> create 'test02',{NAME=>'f1'},{NAME=>'f2'}
Created table test02
Took 0.7848 seconds                                                                                                                                                        
=> Hbase::Table - test02

4.7.如何向表中插入数据

hbase(main):009:0> put 'test01','rk0001','f1:name','zhangsan'
Took 0.2737 seconds                                                                                                                                                        
hbase(main):010:0> put 'test01','rk0001','f1:age','20'
Took 0.0141 seconds                                                                                                                                                        
hbase(main):011:0> put 'test01','rk0001','f1:birthday','2020-10-10'
Took 0.0077 seconds                                                                                                                                                        
hbase(main):012:0> put 'test01','rk0001','f2:sex','nan'
Took 0.0136 seconds                                                                                                                                                        
hbase(main):013:0> put 'test01','rk0001','f2:address','beijing'
Took 0.0127 seconds                                                                                                                                                        
hbase(main):014:0> scan 'test01'
ROW                                         COLUMN+CELL                                                                                                                    
 rk0001                                     column=f1:age, timestamp=1682920000246, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682920029538, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682919262141, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682920573062, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682920550965, value=nan                                                                              
1 row(s)
Took 0.0353 seconds         

4.8.如何修改数据呢?

 修改数据的操作 与 添加数据的操作是一致的, 只需要保证rowkey一样 就是修改数据
hbase(main):015:0> put 'test01','rk0001','f2:address','guangzhou'
Took 0.0094 seconds                                                                                                                                                        
hbase(main):016:0> scan 'test01'
ROW                                         COLUMN+CELL                                                                                                                    
 rk0001                                     column=f1:age, timestamp=1682920000246, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682920029538, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682919262141, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682921131272, value=guangzhou                                                                    
 rk0001                                     column=f2:sex, timestamp=1682920550965, value=nan                                                                              
1 row(s)
Took 0.0161 seconds                                                                                                                                                        
hbase(main):017:0> 

4.9如何删除数据的操作:

格式: 
    delete '表名','rowkey名称','列族:列名'
        
    deleteall '表名','rowkey名称','列族:列名'
    
    truncate '表名' 清空表
说明:
    1) delete操作, 仅支持删除某一个列下的数据, 仅会删除当前这个版本, 恢复上一个版本
    2) deleteall操作, 在删除某一个列数据的时候, 直接将其所有的历史版本全部都删除
    3) deleteall操作, 在不指定列族和列名, 仅指定rowkey的时候, 删除整行

说明:
    deleteall操作在hbase2.x以上的版本提供的

注意:
    truncate操作 一般不使用, 因为此操作在重新建表的时候, 会与原来的表不一致. 比如一些设置参数信息,执行truncate全部都还原了

4.10如何删除表

格式:
    describe  '表名'
    desc 'tablename'
格式:
    drop '表名'

注意: 在删除hbase表之前, 必须要先禁用表

禁用表:  disable  '表名'
启动表: enable '表名'
判断表是否启用: is_enabled '表名'
判断表是否禁用: is_disabled '表名'

4.11如何查看表的结构

hbase(main):017:0> desc 'test01'
Table test01 is ENABLED                                                                                                                                                    
test01                                                                                                                                                                     
COLUMN FAMILIES DESCRIPTION                                                                                                                                                
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOC
K_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_B
LOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                          
{NAME => 'f2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOC
K_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_B
LOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                          
2 row(s)
Took 0.0938 seconds                                                                                                                                                        
hbase(main):018:0> describe 'test01'
Table test01 is ENABLED                                                                                                                                                    
test01                                                                                                                                                                     
COLUMN FAMILIES DESCRIPTION                                                                                                                                                
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOC
K_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_B
LOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                          
{NAME => 'f2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOC
K_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_B
LOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                          
2 row(s)
Took 0.0346 seconds      

4.12如何查看表中有多少条数据:

count '表名'
hbase(main):019:0> count 'test01'
1 row(s)
Took 0.0936 seconds                                                                                                                                                        
=> 1

4.13如何通过扫描的方式查询数据, 以及根据范围查询数据

准备工作:  插入一部分数据
put 'test01','rk0001','f1:name','zhangsan'
put 'test01','rk0001','f1:age','20'
put 'test01','rk0001','f1:birthday','2020-10-10'
put 'test01','rk0001','f2:sex','nan'
put 'test01','rk0001','f2:address','beijing'

put 'test01','rk0002','f1:name','lisi'
put 'test01','rk0002','f1:age','25'
put 'test01','rk0002','f1:birthday','2005-10-10'
put 'test01','rk0002','f2:sex','nv'
put 'test01','rk0002','f2:address','shanghai'

put 'test01','rk0003','f1:name','王五'
put 'test01','rk0003','f1:age','28'
put 'test01','rk0003','f1:birthday','1993-10-25'
put 'test01','rk0003','f2:sex','nan'
put 'test01','rk0003','f2:address','tianjin'

put 'test01','0001','f1:name','zhaoliu'
put 'test01','0001','f1:age','25'
put 'test01','0001','f1:birthday','1995-05-05'
put 'test01','0001','f2:sex','nan'
put 'test01','0001','f2:address','guangzhou'

格式:
    scan '表名' , {COLUMNS=>['列族' | '列族:列名' ....], STARTROW=>'起始rowkey值' ,ENDROW=>'结束rowkey值', FORMATTER=>'toString',LIMIT=>N}

注意
    此处 []  是格式要求, 必须存在了
    范围检索是包头不包尾
hbase(main):020:0> put 'test01','rk0001','f1:name','zhangsan'
Took 0.0116 seconds                                                                                                                                                        
hbase(main):021:0> put 'test01','rk0001','f1:age','20'
Took 0.0070 seconds                                                                                                                                                        
hbase(main):022:0> put 'test01','rk0001','f1:birthday','2020-10-10'
Took 0.0111 seconds                                                                                                                                                        
hbase(main):023:0> put 'test01','rk0001','f2:sex','nan'
Took 0.0250 seconds                                                                                                                                                        
hbase(main):024:0> put 'test01','rk0001','f2:address','beijing'
Took 0.0089 seconds                                                                                                                                                        
hbase(main):025:0> put 'test01','rk0002','f1:name','lisi'
01','rk0003','f1:age','28'
put 'test01','rk0003','f1:birthday','1993-10-25'
put 'test01','rk0003','f2:sex','nan'
put 'test01','rk0003','f2:address','tianjin'
put 'test01','0001','f1:name','zhaoliu'
put 'test01','0001','f1:age','25'
put 'test01','0001','f1:birthday','1995-05-05'
put 'test01','0001','f2:sex','nan'
put 'test01','0001','f2:address','guangzhou'Took 0.0061 seconds                                                                                                                                                        
hbase(main):026:0> put 'test01','rk0002','f1:age','25'
Took 0.0128 seconds                                                                                                                                                        
hbase(main):027:0> put 'test01','rk0002','f1:birthday','2005-10-10'
Took 0.0095 seconds                                                                                                                                                        
hbase(main):028:0> put 'test01','rk0002','f2:sex','nv'
Took 0.0067 seconds                                                                                                                                                        
hbase(main):029:0> put 'test01','rk0002','f2:address','shanghai'
Took 0.0128 seconds                                                                                                                                                        
hbase(main):030:0> put 'test01','rk0003','f1:name','王五'
Took 0.0048 seconds                                                                                                                                                        
hbase(main):031:0> put 'test01','rk0003','f1:age','28'
Took 0.0076 seconds                                                                                                                                                        
hbase(main):032:0> put 'test01','rk0003','f1:birthday','1993-10-25'
Took 0.0054 seconds                                                                                                                                                        
hbase(main):033:0> put 'test01','rk0003','f2:sex','nan'
Took 0.0067 seconds                                                                                                                                                        
hbase(main):034:0> put 'test01','rk0003','f2:address','tianjin'
Took 0.0045 seconds                                                                                                                                                        
hbase(main):035:0> put 'test01','0001','f1:name','zhaoliu'
Took 0.0056 seconds                                                                                                                                                        
hbase(main):036:0> put 'test01','0001','f1:age','25'
Took 0.0064 seconds                                                                                                                                                        
hbase(main):037:0> put 'test01','0001','f1:birthday','1995-05-05'
Took 0.0058 seconds                                                                                                                                                        
hbase(main):038:0> put 'test01','0001','f2:sex','nan'
Took 0.0093 seconds                                                                                                                                                        
hbase(main):039:0> put 'test01','0001','f2:address','guangzhou'
Took 0.0066 seconds                                                                                                                                                        
hbase(main):040:0> count 'test01'
4 row(s)
Took 0.0202 seconds                                                                                                                                                        
=> 4
hbase(main):041:0> 

       查询
hbase(main):041:0> scan 'test01'
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 0001                                       column=f2:sex, timestamp=1682927992273, value=nan                                                                              
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
4 row(s)
Took 0.0398 seconds                                                                                                                                                        
hbase(main):042:0> 

Took 0.0398 seconds                                                                                                                                                        
hbase(main):042:0> scan 'test01',{FORMATTER=>'toString'}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 0001                                       column=f2:sex, timestamp=1682927992273, value=nan                                                                              
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=王五                                                                              
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
4 row(s)
Took 0.0356 seconds    
hbase(main):043:0> scan 'test01',{FORMATTER=>'toString',LIMIT=>2}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 0001                                       column=f2:sex, timestamp=1682927992273, value=nan                                                                              
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
2 row(s)
Took 0.0436 seconds           
hbase(main):044:0> scan 'test01',{COLUMN=>'f1'}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
4 row(s)
Took 0.0157 seconds
hbase(main):045:0> scan 'test01',{COLUMN=>['f1','f2:address']}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
4 row(s)
Took 0.0353 seconds                                                                                                                                                        
hbase(main):046:0> 

hbase(main):046:0> scan 'test01',{STARTROW=>'rk0001',ENDROW=>'rk0003'}
ROW                                         COLUMN+CELL                                                                                                                    
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
2 row(s)
Took 0.0163 seconds   

六、hbase的高级shell命令

whoami: 查看当前登录用户

hbase(main):002:0> whoami
pxj (auth:SIMPLE)
    groups: pxj
Took 0.0098 seconds

exists查看表是否存在

hbase(main):003:0> exists 'test01'
Table test01 does exist                                                                                                                                                    
Took 0.5810 seconds                                                                                                                                                        
=> true
alter: 用来执行修改表的操作
增加列族:
    alter '表名' ,NAME=>'新的列族'
删除列族: 
    alter '表名','delete'=>'旧的列族'
hbase的filter过滤器相关的操作 :
              作用:补充hbase的查询方式
格式:
    scan '表名',{FILTER=>"过滤器(比较运算符,'比较器表达式')"}

在hbase中常用的过滤器: 
    rowkey过滤器:  
        RowFilter:  实现根据某一个rowkey过滤数据
        PrefixFilter: rowkey前缀过滤器
    列族过滤器: 
        FamilyFilter: 列族过滤器
    列名过滤器:
        QualifierFilter : 列名过滤器,  显示对应列的数据
    列值过滤器: 
        ValueFilter: 列值过滤器, 找到符合条件的列值
        SingleColumnValueFilter: 在指定列族和列名下, 查询符合对应列值数据 的整行数据
        SingleColumnValueExcludeFilter : 在指定列族和列名下, 查询符合对应列值数据 的整行数据 结果不包含过滤字段
    其他过滤器:
        PageFilter : 用于分页过滤器

比较运算符:  >  <  >= <= != =

比较器: 
    BinaryComparator: 用于进行完整的匹配操作
    BinaryPrefixComparator : 匹配指定的前缀数据
    NullComparator : 空值匹配操作
    SubstringComparator: 模糊匹配

比较器表达式: 
    BinaryComparator         binary:值
    BinaryPrefixComparator   binaryprefix:值
    NullComparator           null
    SubstringComparator      substring:值

参考地址:
    http://hbase.apache.org/2.2/devapidocs/index.html  
    从这个地址下, 找到对应过滤器, 查看其构造, 根据构造编写filter过滤器即可
    需求一: 找到在列名中包含 字母 e 列名有哪些
    hbase(main):004:0> scan 'test01',{FILTER=>"QualifierFilter(=,'substring:e')"}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 0001                                       column=f2:sex, timestamp=1682927992273, value=nan                                                                              
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
4 row(s)
Took 0.1787 seconds          

需求二: 查看rowkey以rk开头的数据

hbase(main):005:0> scan 'test01',{FILTER=>"PrefixFilter('rk')"}
ROW                                         COLUMN+CELL                                                                                                                    
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
3 row(s)
Took 0.0328 seconds  

hbase(main):006:0> scan 'test01',{FILTER=>"RowFilter(=,'binaryprefix:rk')"}
ROW                                         COLUMN+CELL                                                                                                                    
 rk0001                                     column=f1:age, timestamp=1682927955560, value=20                                                                               
 rk0001                                     column=f1:birthday, timestamp=1682927955584, value=2020-10-10                                                                  
 rk0001                                     column=f1:name, timestamp=1682927955535, value=zhangsan                                                                        
 rk0001                                     column=f2:address, timestamp=1682927957076, value=beijing                                                                      
 rk0001                                     column=f2:sex, timestamp=1682927955609, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
3 row(s)
Took 0.0502 seconds   

需求三: 查询 年龄大于等于25岁的数据

hbase(main):007:0> scan 'test01',{FILTER=>"SingleColumnValueFilter('f1','age',>=,'binary:25')"}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:age, timestamp=1682927992232, value=25                                                                               
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 0001                                       column=f2:sex, timestamp=1682927992273, value=nan                                                                              
 rk0002                                     column=f1:age, timestamp=1682927991975, value=25                                                                               
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:age, timestamp=1682927992124, value=28                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
3 row(s)
Took 0.0422 seconds  

hbase(main):008:0> scan 'test01',{FILTER=>"SingleColumnValueExcludeFilter('f1','age',>=,'binary:25')"}
ROW                                         COLUMN+CELL                                                                                                                    
 0001                                       column=f1:birthday, timestamp=1682927992253, value=1995-05-05                                                                  
 0001                                       column=f1:name, timestamp=1682927992216, value=zhaoliu                                                                         
 0001                                       column=f2:address, timestamp=1682927993912, value=guangzhou                                                                    
 0001                                       column=f2:sex, timestamp=1682927992273, value=nan                                                                              
 rk0002                                     column=f1:birthday, timestamp=1682927992008, value=2005-10-10                                                                  
 rk0002                                     column=f1:name, timestamp=1682927991952, value=lisi                                                                            
 rk0002                                     column=f2:address, timestamp=1682927992080, value=shanghai                                                                     
 rk0002                                     column=f2:sex, timestamp=1682927992059, value=nv                                                                               
 rk0003                                     column=f1:birthday, timestamp=1682927992148, value=1993-10-25                                                                  
 rk0003                                     column=f1:name, timestamp=1682927992104, value=\xE7\x8E\x8B\xE4\xBA\x94                                                        
 rk0003                                     column=f2:address, timestamp=1682927992197, value=tianjin                                                                      
 rk0003                                     column=f2:sex, timestamp=1682927992178, value=nan                                                                              
3 row(s)
Took 0.0273 seconds

七、Java操作API

准备工作:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.ccj.pxj</groupId>
    <artifactId>Hbase_Ky</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <repositories><!--代码库-->
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            <releases><enabled>true</enabled></releases>
            <snapshots>
                <enabled>false</enabled>
                <updatePolicy>never</updatePolicy>
            </snapshots>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.1.0</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.6</version></dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>org.testng</groupId>
            <artifactId>testng</artifactId>
            <version>6.14.3</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <target>1.8</target>
                    <source>1.8</source>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

7.1创建表

 @Test
    public void test01() throws  Exception{
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","pxj62:2181,pxj63:2181,pxj64:2181");
        Connection hbaseConn = ConnectionFactory.createConnection(conf);
        // 2) 根据连接对象, 获取相关的管理对象:  admin(执行对表进行操作)  table(执行对表数据的操作)tabl
        Admin admin = hbaseConn.getAdmin();
        // 3) 执行相关的操作
        // 3.1) 判断表是否存在呢?
        // 返回true 表示存在  返回false 表示不存在
        boolean flag = admin.tableExists(TableName.valueOf("WATER_BILL"));
        if(!flag){
            // 说明表不存在, 需要构建表
            //3.2 创建表
            //3.2.1 创建表的构建器对象
            TableDescriptorBuilder tableDescriptorBuilder = TableDescriptorBuilder.newBuilder(TableName.valueOf("WATER_BILL"));
            //3.2.2 在构建器对象中, 设置表的列族信息
            ColumnFamilyDescriptor familyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder("C1".getBytes()).build();
            tableDescriptorBuilder.setColumnFamily(familyDescriptor);
//            3.2.3得到表结构对象
            TableDescriptor tableDescriptor = tableDescriptorBuilder.build();
            admin.createTable(tableDescriptor);
        }
//        处理结果集(只要查询才有结果集)
//        释放资源
        admin.close();
        hbaseConn.close();

    }

7.2 添加数据

   @Test
    public void test02() throws  Exception{
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","pxj62:2181,pxj63:2181,pxj64:2181");
        Connection hbaseConn = ConnectionFactory.createConnection(conf);
        // 2- 根据连接对象, 获取相关的管理对象: admin  table
        Table table = hbaseConn.getTable(TableName.valueOf("WATER_BILL"));
//        3.执行相关操作:添加数据
        Put put = new Put("4944191".getBytes());
        put.addColumn("C1".getBytes(),"NAME".getBytes(),"登卫红".getBytes());
        put.addColumn("C1".getBytes(),"ADDRESS".getBytes(),"贵州省铜仁市德江县7单元267室".getBytes());
        put.addColumn("C1".getBytes(),"SEX".getBytes(),"男".getBytes());

        table.put(put);
        // 4- 处理结果集(只有查询存在)

        // 5- 释放资源
        table.close();
        hbaseConn.close();

    }

7.3抽取公共方法

 private Connection hbaseConn;
    private Admin admin;
    private Table table;
    @Before
    public void before() throws Exception{

        // 1- 根据hbase的连接工厂对象创建hbase的连接对象
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","pxj62:2181,pxj63:2181,pxj64:2181");
        hbaseConn = ConnectionFactory.createConnection(conf);
        // 2- 根据连接对象, 获取相关的管理对象: admin  table
        admin = hbaseConn.getAdmin();
        table = hbaseConn.getTable(TableName.valueOf("WATER_BILL"));

    }

7.4查询一条数据

@Test
    public void test03() throws Exception {
        // 3- 执行相关的操作
        Get get = new Get("4944191".getBytes());
        Result result = table.get(get);
        List<Cell> cells = result.listCells();
        for (Cell cell : cells) {
            byte[] rowKeyBytes = CellUtil.cloneRow(cell);
            byte[] familyBytes = CellUtil.cloneFamily(cell);
            byte[] columnNameBtyes = CellUtil.cloneQualifier(cell);
            byte[] valueBytes = CellUtil.cloneValue(cell);

            String rowKey = Bytes.toString(rowKeyBytes);
            String family = Bytes.toString(familyBytes);
            String columnName = Bytes.toString(columnNameBtyes);
            String value = Bytes.toString(valueBytes);

            System.out.println("rowkey为:"+rowKey +", 列族为:"+family +"; 列名为:"+columnName+"; 列值为:"+value);
        }
    }

7.5删除数据

//    需求五:删除数据操作,rowkey为4944191的数据删除
    @Test
    public void test05() throws  Exception{
        Delete delete = new Delete("4944191".getBytes());
        table.delete(delete);
    }
 hbase(main):004:0> scan 'WATER_BILL'
ROW                                         COLUMN+CELL                                                                                                                    
0 row(s)
Took 0.0183 seconds                  
7.6删除表操作
  @Test
    public void test06() throws  Exception{
        //3. 执行相关的操作

        //3.1: 如果表没有被禁用, 先禁用表
        if( admin.isTableEnabled(TableName.valueOf("WATER_BILL")) ){
            admin.disableTable(TableName.valueOf("WATER_BILL"));
        }
        //3.2: 执行删除
        admin.deleteTable(TableName.valueOf("WATER_BILL"));

        //4. 处理结果集
    }
=> ["test01", "test02"]
hbase(main):006:0> scan 'WATER_BILL'
ROW                                         COLUMN+CELL                                                                                                                    
org.apache.hadoop.hbase.TableNotFoundException: WATER_BILL
    at org.apache.hadoop.hbase.client.ConnectionImplementation.getTableState(ConnectionImplementation.java:1954)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.isTableDisabled(ConnectionImplementation.java:583)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:713)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328)
    at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
    at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

ERROR: Unknown table WATER_BILL!

For usage try 'help "scan"'

Took 1.0289 seconds            

7.7导入数据的操作

如何导入数据
hbase org.apache.hadoop.hbase.mapreduce.Import 表名 HDFS数据文件路径
执行相关操作
1) 需要先将资料中10w抄表数据上传到HDFS中: 

hdfs dfs -mkdir -p /hbase/water_bill/input
将数据上传到此目录下
hdfs dfs -put part-m-00000_10w  /hbase/water_bill/input

2) 执行导入操作:
hbase org.apache.hadoop.hbase.mapreduce.Import WATER_BILL /hbase/water_bill/input/part-m-00000_10w


[pxj@pxj63 /opt/sofe]$rz -E
rz waiting to receive.
[pxj@pxj63 /opt/sofe]$ll
总用量 712392
-rw-r--r--. 1 pxj pxj 678001736 3月  21 22:53 mysql-5.7.40-linux-glibc2.12-x86_64.tar.gz
-rw-r--r--. 1 pxj pxj  51483241 4月  13 23:32 part-m-00000_10w
[pxj@pxj63 /opt/sofe]$hdfs dfs -mkdir -p /hbase/water_bill/input



hbase(main):007:0> count 'WATER_BILL'
Current count: 1000, row: 0100876                                                                                                                                          
Current count: 2000, row: 0198911                                                                                                                                          
Current count: 3000, row: 0297202                                                                                                                                          
Current count: 4000, row: 0396260                                                                                                                                          
Current count: 5000, row: 0496133                                                                                                                                          
Current count: 6000, row: 0600497                                                                                                                                          
Current count: 7000, row: 0703223                                                                                                                                          
Current count: 8000, row: 0800139                                                                                                                                          
Current count: 9000, row: 0894996                                                                                                                                          
Current count: 10000, row: 0989166                                                                                                                                         
Current count: 11000, row: 1083304                                                                                                                                         
Current count: 12000, row: 1176972                                                                                                                                         
Current count: 13000, row: 1282285                                                                                                                                         
Current count: 14000, row: 1384119                                                                                                                                         
Current count: 15000, row: 1486440                                                                                                                                         
Current count: 16000, row: 1585872                                                                                                                                         
Current count: 17000, row: 1683376                                                                                                                                         
Current count: 18000, row: 1784217                                                                                                                                         
Current count: 19000, row: 1883173                                                                                                                                         
Current count: 20000, row: 1981216                                                                                                                                         
Current count: 21000, row: 2080089                                                                                                                                         
Current count: 22000, row: 2177073                                                                                                                                         
Current count: 23000, row: 2281290                                                                                                                                         
Current count: 24000, row: 2387611                                                                                                                                         
Current count: 25000, row: 2485928                                                                                                                                         
Current count: 26000, row: 2586855                                                                                                                                         
Current count: 27000, row: 2692853                                                                                                                                         
Current count: 28000, row: 2790279                                                                                                                                         
Current count: 29000, row: 2891564                                                                                                                                         
Current count: 30000, row: 2992772                                                                                                                                         
Current count: 31000, row: 3092745                                                                                                                                         
Current count: 32000, row: 3192473                                                                                                                                         
Current count: 33000, row: 3292718                                                                                                                                         
Current count: 34000, row: 3392517                                                                                                                                         
Current count: 35000, row: 3492498                                                                                                                                         
Current count: 36000, row: 3597604                                                                                                                                         
Current count: 37000, row: 3699894                                                                                                                                         
Current count: 38000, row: 3803168                                                                                                                                         
Current count: 39000, row: 3907990                                                                                                                                         
Current count: 40000, row: 4010517                                                                                                                                         
Current count: 41000, row: 4110878                                                                                                                                         
Current count: 42000, row: 4207162                                                                                                                                         
Current count: 43000, row: 4306768                                                                                                                                         
Current count: 44000, row: 4413198                                                                                                                                         
Current count: 45000, row: 4512536                                                                                                                                         
Current count: 46000, row: 4612263                                                                                                                                         
Current count: 47000, row: 4713620                                                                                                                                         
Current count: 48000, row: 4815897                                                                                                                                         
Current count: 49000, row: 4916970                                                                                                                                         
Current count: 50000, row: 5011658                                                                                                                                         
Current count: 51000, row: 5118661                                                                                                                                         
Current count: 52000, row: 5214746                                                                                                                                         
Current count: 53000, row: 5312632                                                                                                                                         
Current count: 54000, row: 5409128                                                                                                                                         
Current count: 55000, row: 5502543                                                                                                                                         
Current count: 56000, row: 5601945                                                                                                                                         
Current count: 57000, row: 5707443                                                                                                                                         
Current count: 58000, row: 5815118                                                                                                                                         
Current count: 59000, row: 5913868                                                                                                                                         
Current count: 60000, row: 6014358                                                                                                                                         
Current count: 61000, row: 6111505                                                                                                                                         
Current count: 62000, row: 6208207                                                                                                                                         
Current count: 63000, row: 6309356                                                                                                                                         
Current count: 64000, row: 6414059                                                                                                                                         
Current count: 65000, row: 6516637                                                                                                                                         
Current count: 66000, row: 6612872                                                                                                                                         
Current count: 67000, row: 6718005                                                                                                                                         
Current count: 68000, row: 6814867                                                                                                                                         
Current count: 69000, row: 6919232                                                                                                                                         
Current count: 70000, row: 7014585                                                                                                                                         
Current count: 71000, row: 7115052                                                                                                                                         
Current count: 72000, row: 7215747                                                                                                                                         
Current count: 73000, row: 7316079                                                                                                                                         
Current count: 74000, row: 7419978                                                                                                                                         
Current count: 75000, row: 7524553                                                                                                                                         
Current count: 76000, row: 7628323                                                                                                                                         
Current count: 77000, row: 7729588                                                                                                                                         
Current count: 78000, row: 7833969                                                                                                                                         
Current count: 79000, row: 7935328                                                                                                                                         
Current count: 80000, row: 8035829                                                                                                                                         
Current count: 81000, row: 8133527                                                                                                                                         
Current count: 82000, row: 8236834                                                                                                                                         
Current count: 83000, row: 8341968                                                                                                                                         
Current count: 84000, row: 8442569                                                                                                                                         
Current count: 85000, row: 8542044                                                                                                                                         
Current count: 86000, row: 8648227                                                                                                                                         
Current count: 87000, row: 8746478                                                                                                                                         
Current count: 88000, row: 8848619                                                                                                                                         
Current count: 89000, row: 8948384                                                                                                                                         
Current count: 90000, row: 9048613                                                                                                                                         
Current count: 91000, row: 9151751                                                                                                                                         
Current count: 92000, row: 9250679                                                                                                                                         
Current count: 93000, row: 9349696                                                                                                                                         
Current count: 94000, row: 9450573                                                                                                                                         
Current count: 95000, row: 9550716                                                                                                                                         
Current count: 96000, row: 9651741                                                                                                                                         
Current count: 97000, row: 9747953                                                                                                                                         
Current count: 98000, row: 9848779                                                                                                                                         
Current count: 99000, row: 9951726                                                                                                                                         
99505 row(s)
Took 8.1651 seconds                                                                                                                                                        
=> 99505

7.8案例

需求: 查询2020年 6月份所有用户的用水量:

日期字段: RECORD_DATE

用水量: NUM_USAGE

用户: NAME

     /*
        需求: 查询2020年 6月份所有用户的用水量:
        日期字段: RECORD_DATE
        用水量: NUM_USAGE
        用户: NAME
     */
    // SQL: select NAME,NUM_USAGE    from  WATER_BILL where RECORD_DATE between '2020-06-01'  and '2020-06-30';

```java
    @Test
    public void test07()throws  Exception{
//        3.执行相关的操作
        Scan scan = new Scan();
//        3.1:设置过滤条件
        SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
                "C1".getBytes(),
                "RECORD_DATE".getBytes(),
                CompareOperator.GREATER_OR_EQUAL,
                new BinaryComparator("2020-06-01".getBytes())

        );
        SingleColumnValueFilter filter2 = new SingleColumnValueFilter(
                "C1".getBytes(),
                "RECORD_DATE".getBytes(),
                CompareOperator.LESS_OR_EQUAL,
                new BinaryComparator("2020-06-30".getBytes())
        );
        //3.1.2 构建 filter集合, 将镀铬filter合并在一起
        FilterList filterList = new FilterList();
        filterList.addFilter(filter1);
        filterList.addFilter(filter2);
//        设置输出行数
        scan.setLimit(10);
//        在查询的时候,限定返回那些列的数据
        scan.addColumn("C1".getBytes(),"NAME".getBytes());
        scan.addColumn("C1".getBytes(),"NUM_USAGE".getBytes());
        scan.addColumn("C1".getBytes(),"RECORD_DATE".getBytes());

        ResultScanner results = table.getScanner(scan); // 获取到多行数据
        //4- 处理结果集
        //4.1: 获取每一行的数据
        for (Result result : results) {
            // 4.2  将一行中每一个单元格获取
            List<Cell> cells = result.listCells();

            // 4.3 遍历每一个单元格: 一个单元格里面主要包含(rowkey信息, 列族信息, 列名信息, 列值信息)
            for (Cell cell : cells) {
                byte[] columnNameBtyes = CellUtil.cloneQualifier(cell);
                String columnName = Bytes.toString(columnNameBtyes);

                //if("NAME".equals(columnName) || "NUM_USAGE".equals(columnName)  || "RECORD_DATE".equals(columnName)){
                byte[] rowKeyBytes = CellUtil.cloneRow(cell);
                byte[] familyBytes = CellUtil.cloneFamily(cell);
                byte[] valueBytes = CellUtil.cloneValue(cell);

                String rowKey = Bytes.toString(rowKeyBytes);
                String family = Bytes.toString(familyBytes);

                Object value ;
                if("NUM_USAGE".equals(columnName)){
                    value = Bytes.toDouble(valueBytes);
                }else{
                    value = Bytes.toString(valueBytes);
                }


                System.out.println("rowkey为:"+rowKey +", 列族为:"+family +"; 列名为:"+columnName+"; 列值为:"+value);
                //}


            }
            System.out.println("---------------------------------------");
        }

    }

作者:潘陈(pxj)
日期:2023-05-03

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值