day01(HBase)

一、概述

(https://hbase.apache.org/

一、概述
1 HBase是Apache提供的开源的非关系型数据库
2. HBase的底层存储是基于Hadoop,是一个分布式的、可扩展的能存储大
量数据的数据库
3. HBase能够实时读写大量的数据
4. HBase是一个NOSQL (Not Only sQL)的数据库
5. HBase是由Doug带领团队实现的,仿照了Google的<Bigtable: ADistributed Storage system for Structured Data>。也因为HBase是仿照BigTable实现的,所以HBase的原理和BigTable一样,只是BigTable是用c语言实现的,HBase是用Java实现的
7.在HBase中,如果要删除一个表,首先要禁用这个表,禁用之后才能进行删除
8.在HBase中,支持的数据类型就只有string和整数
9.HBase中存储的数据是稀疏的
10.在HBase中,会对添加的每一条数据增加一个字段,这个字段是时间戳
10.在HBase中,会对添加的每一条数据增加一个字段,这个字段是时间戳。在HBase中进行查找的时候,如果不指定默认找的是最新的。这也就意味着HBase中改的能力实际上并不是修改原始数据,而是在文件尾部追加数据,因为获取的是最新的数据,所以看起来像是做了修改-这个时间戳称之为数据的版本VERSION
11.在HBase中,如果不指定,每一个列族只存储1个版本的数据,在获取的时候也只能拿一个版本的数据。也就意味着如果需要获取多个版本的数据,那么在建表的时候就需要指定保留的版本个数

行级别和列级别数据库比较
在这里插入图片描述
在这里插入图片描述

二、基本概念
1.行键rowkey:类比RDBMS中的主键,在HBase中每一条数据都必须对应一个行键。注意,在放数据的时候,要求行键是唯一的。如果是相同的行键,那么认为是同一条数据。HBase会默认对键进行排序,默认是字典序

2.列族/列簇column family:是HBase中存储数据的基本单位。在HBase中,每一个表中至少包含1个列族。理论上不限制列族的数量,但是实际开发过程中,一般建议列族的数量不要超过3个。一般不建议跨列族查询。
3.列column:在HBase中往往不强调列。一个列族中可以包含0到多个列,而且在使用过程中可以动态的增删列。列的个数是不固定的。
4.名称空间namespace:类比于MySQL中的database的概念。主要的作用是用于进行表的区分。注意,在不指定的情况下,默认使用的名称空间是default
5.单元cell: rowkey+column+timestamp锁定的唯一数据/version锁定唯一数据

hbase下载

https://archive.apache.org/dist/hbase/0.98.17/
 hbase-0.98.17-hadoop2-bin.tar.gz   

解压安装

配置

[root@hadoop01 conf]# vim hbase-env.sh
export JAVA_HOME=/home/presoftware/jdk1.8.0_181
#关闭自带的zk
export HBASE_MANAGES_ZK=false
[root@hadoop01 conf]# source hbase-env.sh
[root@hadoop01 conf]# vim hbase-site.xml 	
<configuration>
<!--配置hbase在HDFS上的存储路径-->
<property>
        <name>hbase.rootdir</name>
        <value>hdfs://hadoop01:9000/hbase</value>
</property> 
<!--开启hbase的分布式-->
<property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
</property>
<!--#配置Zookeeper的连接地址与端口号-->
<property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
</configuration>

[root@hadoop01 conf]# vim regionservers 

hadoop01
hadoop02
hadoop03
    

启动三个节点上的zk

拷贝第一台机器上的hbase到其他两台机器上

[root@hadoop01 presoftware]# scp -r  hbase-0.98.17-hadoop2 root@hadoop02:/home/presoftware/

[root@hadoop01 presoftware]# scp -r  hbase-0.98.17-hadoop2 root@hadoop03:/home/presoftware/

启动hbase(第一台(zk的leader))

[root@hadoop01 bin]# sh start-hbase.sh

starting master, logging to /home/presoftware/hbase-0.98.17-hadoop2/bin/../logs/hbase-root-master-hadoop01.out
hadoop02: starting regionserver, logging to /home/presoftware/hbase-0.98.17-hadoop2/bin/../logs/hbase-root-regionserver-hadoop02.out
hadoop03: starting regionserver, logging to /home/presoftware/hbase-0.98.17-hadoop2/bin/../logs/hbase-root-regionserver-hadoop03.out
hadoop01: starting regionserver, logging to /home/presoftware/hbase-0.98.17-hadoop2/bin/../logs/hbase-root-regionserver-hadoop01.out


[root@hadoop01 bin]# jps
4368 QuorumPeerMain
4770 HRegionServer
4643 HMaster

其他两台机器

[root@hadoop02 presoftware]# jps
1577 QuorumPeerMain
1774 Jps

页面访问hbase  地址
192.168.253.129:60010

效果
在这里插入图片描述
进入命令行

[root@hadoop01 bin]#sh hbase shell

hbase删除------CTRL+ 删除键
修改配置可以直接删除
在这里插入图片描述

基本命令

hbase(main):011:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
hbase(main):012:0> version
0.98.17-hadoop2, rd5f8300c082a75ce8edbbe08b66f077e7d663a4a, Fri Jan 15 22:46:43 PST 2016

hbase(main):014:0> whoiam
NameError: undefined local variable or method `whoiam' for #<Object:0x2a2ef072>


建表

hbase(main):015:0> create 'person',{NAME=>'basic'},{NAME=>'info'}
0 row(s) in 4.1870 seconds

=> Hbase::Table - person

查看

hbase(main):027:0> list
TABLE                                                                            
person                                                                           
1 row(s) in 0.1170 seconds

=> ["person"]

查看表结构

hbase(main):029:0> desc 'person'
Table person is ENABLED                                                          
person                                                                           
COLUMN FAMILIES DESCRIPTION                                                      
{NAME => 'basic', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COM
PRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6', REPLICATION_SCOPE => '0'}                                                    
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEE
P_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMP
RESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536
', REPLICATION_SCOPE => '0'}                                                     
2 row(s) in 0.2910 seconds

创建表简写方式

hbase(main):030:0> create 'student','basic','info'
0 row(s) in 1.5010 seconds

=> Hbase::Table - student

删除表(报错,需要禁用)

hbase(main):004:0> drop 'student'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/presoftware/hbase-0.98.17-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/presoftware/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

ERROR: Table student is enabled. Disable it first.

Here is some help for this command:
Drop the named table. Table must first be disabled:
  hbase> drop 't1'
  hbase> drop 'ns1:t1'

正确删除表方式(先禁用)

hbase(main):005:0> 
hbase(main):005:0> disable 'student'
0 row(s) in 2.4740 seconds

hbase(main):006:0> drop 'student'
0 row(s) in 2.2510 seconds

hbase(main):007:0> 

hbase(main):009:0> exists 'student'
Table student does not exist                                                     
0 row(s) in 0.1620 seconds

hbase(main):011:0> is_enabled 'person'
true                                                                             
0 row(s) in 0.0790 seconds

新建名称空间

hbase(main):012:0> create_namespace 'hbasedemo'
0 row(s) in 0.6230 seconds

hbase(main):013:0> list_namespace
NAMESPACE                                                                        
default                                                                          
hbase                                                                            
hbasedemo                                                                        
3 row(s) in 0.1390 seconds
hbase(main):014:0> list_namespace_tables 'default'
TABLE                                                                            
person                                                                           
1 row(s) in 0.0960 second

在自定义名称空间下建表

hbase(main):016:0> create 'hbasedemo:person','basic','info'
0 row(s) in 4.7610 seconds

=> Hbase::Table - hbasedemo:person

删除名称空间(此名称空间无表才可删除)

hbase(main):017:0> disable 'hbasedemo:person'
0 row(s) in 1.5760 seconds

hbase(main):018:0> drop 'hbasedemo:person'
0 row(s) in 0.5080 seconds

hbase(main):019:0> drop_namespace 'hbasedemo'
0 row(s) in 0.4460 seconds

增加数据行键为p1,列族为basic

hbase(main):031:0> put 'person','p1','basic:name','Amy'
0 row(s) in 0.8850 seconds

hbase(main):032:0> put 'person','p1','basic:age',15
0 row(s) in 0.0920 seconds

hbase(main):033:0> put 'person','p1','info:addr','beijing'
0 row(s) in 0.2320 seconds

hbase(main):054:0> get 'person', 'p1'
COLUMN                CELL                                                       
 basic:age            timestamp=1658618923507, value=15                          
 basic:name           timestamp=1658618824327, value=Amy                         
 info:addr            timestamp=1658618960873, value=beijing                     
3 row(s) in 0.1570 seconds

hbase(main):055:0> get 'person', 'p1', {COLUMN => 'basic'}
COLUMN                CELL                                                       
 basic:age            timestamp=1658618923507, value=15                          
 basic:name           timestamp=1658618824327, value=Amy                         
2 row(s) in 0.0440 seconds

hbase(main):056:0> get 'person', 'p1', {COLUMN => 'basic:age'}
COLUMN                CELL                                                       
 basic:age            timestamp=1658618923507, value=15                          
1 row(s) in 0.0350 seconds

hbase(main):057:0> get 'person', 'p1', 'basic:age'
COLUMN                CELL                                                       
 basic:age            timestamp=1658618923507, value=15                          
1 row(s) in 0.1030 seconds

hbase(main):058:0> get 'person', 'p1', 'basic:name','basic:age','info:addr'
COLUMN                CELL                                                       
 basic:age            timestamp=1658618923507, value=15                          
 basic:name           timestamp=1658618824327, value=Amy                         
 info:addr            timestamp=1658618960873, value=beijing                     
3 row(s) in 0.0790 seconds

hbase(main):059:0> put 'person','p2','basic:name','Sam'
0 row(s) in 0.4240 seconds

hbase(main):060:0> get 'person','p2','basic:name'
COLUMN                CELL                                                       
 basic:name           timestamp=1658619944413, value=Sam                         
1 row(s) in 0.0470 seconds

hbase(main):061:0> put 'person','p2','basic:gender','male'
0 row(s) in 0.0490 seconds
hbase(main):062:0> put 'person','p2','info:phone',182336698
0 row(s) in 0.0560 seconds

删除命令(删除列族中的某列和整个列族)

hbase(main):063:0> delete 'person','p2','info:phone'
0 row(s) in 0.2670 seconds

hbase(main):064:0> delete 'person','p2','info'
0 row(s) in 0.0360 seconds`在这里插入代码片`

删除行键p2的值

hbase(main):068:0> deleteall 'person','p2'
0 row(s) in 0.2760 seconds

修改列族中的某一列的值

hbase(main):069:0> put 'person','pa','basic:name','Tom'
0 row(s) in 0.2550 seconds

扫描person全表

hbase(main):071:0> scan 'person'
ROW                   COLUMN+CELL                                                
 p1                   column=basic:age, timestamp=1658618923507, value=15        
 p1                   column=basic:name, timestamp=1658618824327, value=Amy      
 p1                   column=info:addr, timestamp=1658618960873, value=beijing   
 pa                   column=basic:name, timestamp=1658620504909, value=Tom      
2 row(s) in 0.3560 seconds

hbase(main):075:0> scan 'person',{COLUMNS=>['basic']}
ROW                   COLUMN+CELL                                                
 p1                   column=basic:age, timestamp=1658618923507, value=15        
 p1                   column=basic:name, timestamp=1658618824327, value=Amy      
 pa                   column=basic:name, timestamp=1658620504909, value=Tom      
2 row(s) in 0.2150 seconds

hbase(main):076:0> scan 'person',{COLUMNS=>['basic:name']}
ROW                   COLUMN+CELL                                                
 p1                   column=basic:name, timestamp=1658618824327, value=Amy      
 pa                   column=basic:name, timestamp=1658620504909, value=Tom      
2 row(s) in 0.1090 seconds

查看最近版本

hbase(main):079:0> get 'person', 'p1', {COLUMN => 'basic:name' , VERSIONS => 3}
COLUMN                CELL                                                       
 basic:name           timestamp=1658618824327, value=Amy                         
1 row(s) in 0.1860 seconds

删除person表

hbase(main):002:0> disable 'person'
0 row(s) in 3.9030 seconds

hbase(main):003:0> drop 'person'
0 row(s) in 2.6050 seconds

创建表(带版本)

hbase(main):004:0> create 'persion',{NAME=>'basic',VERSIONS=>3},{NAME=>'info',VERSIONS=>4}
0 row(s) in 1.6960 seconds

hbase(main):005:0> desc 'persion'
Table persion is ENABLED                                                         
persion                                                                          
COLUMN FAMILIES DESCRIPTION                                                      
{NAME => 'basic', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COM
PRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6', REPLICATION_SCOPE => '0'}                                                    
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '4', IN_MEMORY => 'false', KEE
P_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMP
RESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536
', REPLICATION_SCOPE => '0'}                                                     
2 row(s) in 0.2860 seconds

hbase(main):006:0> put 'persion','p1','basic:age',14
0 row(s) in 0.5000 seconds

hbase(main):007:0> put 'persion','p1','basic:age',15
0 row(s) in 0.0100 seconds

hbase(main):008:0> put 'persion','p1','basic:age',16
0 row(s) in 0.0490 seconds

hbase(main):009:0> put 'persion','p1','basic:age',17

hbase(main):011:0> get 'persion','p1',{COLUMN=>'basic:age',VERSIONS=>3}
COLUMN                CELL                                                       
 basic:age            timestamp=1658654249093, value=17                          
 basic:age            timestamp=1658654245954, value=16                          
 basic:age            timestamp=1658654243277, value=15                          
3 row(s) in 0.0230 seconds

Java操作hbase
首先配置windows本地hosts文件(如果不配置则日志报错,不能识别主机名hadoop01等)

#zookeeper配置地址映射
192.168.253.129 hadoop01
192.168.253.130 hadoop02
192.168.253.131 hadoop03
package cn.hbase;


import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.junit.Before;
import org.junit.Test;

public class HBaseDemo {
	
	Configuration conf;
	@Before
	public void connect(){
		//连接hbase
				//获取hbase的配置
				conf=HBaseConfiguration.create();
				//设置zk的连接地址
				conf.set("hbase.zookeeper.quorum","192.168.253.129:2181,"
						+ "192.168.253.130:2181,192.168.253.131:2181");
	}
	//建表
	@Test
	public void createTable() throws Exception{
		
		
		//连接hbase,获取管理权限
		HBaseAdmin admin=new HBaseAdmin(conf);
		//创建一个表描述器
		HTableDescriptor desc=new HTableDescriptor(TableName.valueOf("student"));
		//创建列描述器
		HColumnDescriptor f1=new HColumnDescriptor("basic");
		HColumnDescriptor f2=new HColumnDescriptor("info");
		
		//像表中指定列族
		desc.addFamily(f1);
		desc.addFamily(f2);
		//建表
		admin.createTable(desc);
		//关流
		admin.close();
		
	}
	
	
	//添加数据
	@Test
	public void pubData() throws Exception{
		
		//获取表
		HTable table=new HTable(conf, TableName.valueOf("student"));
		//创建一个put对象
		//参数表示行键
		Put put=new Put("s1".getBytes());
		//family-列族
		//qualifier-列名
		//value-值
		put.add("basic".getBytes(), "id".getBytes(), "s202202".getBytes());
		put.add("basic".getBytes(), "name".getBytes(), "Bob".getBytes());
		//添加数据
		table.put(put);
		//关流
		table.close();
	}
	
	//统计百万条数据插入hbase----25s
	@Test
	public void putMillionData() throws IOException{
		long begin=System.currentTimeMillis();
		HTable table=new HTable(conf, TableName.valueOf("student"));
		List<Put> puts=new ArrayList<>();
		for (int i = 0; i < 1000000; i++) {
			Put put=new Put(("s"+i).getBytes());
			put.add("basic".getBytes(), "id".getBytes(), ("no"+i).getBytes());
			puts.add(put);
			if(puts.size()>=1000){
				table.put(puts);
				puts.clear();
			}
			
		}
		long end=System.currentTimeMillis();
		table.close();
		System.err.println(end-begin);
	}
	
	//删除数据
	@Test
	public void deleteData() throws IOException{
		
		//获取表
		HTable table=new HTable(conf, TableName.valueOf("student"));
		//创建delete对象
		Delete del=new Delete("s1".getBytes());
//		del.deleteColumn(family, qualifier, timestamp)
		table.delete(del);
		//删除
		table.close();
	}
	
	//查询---获取单条数据
	@Test
	public void getData() throws IOException{
		
		HTable table=new HTable(conf, TableName.valueOf("student"));
		//
		Get get=new Get("s2".getBytes());
//		get.addColumn(family, qualifier)
//		get.addFamily(family)
		//获取数据
		Result result = table.get(get);
		byte[] data = result.getValue("basic".getBytes(), "id".getBytes());
		System.err.println(new String(data));
		table.close();
	}
	
	//查询全部数据--扫描scan
	@Test
	public void scanData() throws IOException{
		
		HTable table=new HTable(conf, TableName.valueOf("student"));
		//无参,扫描全表;
//		Scan scan=new Scan();
		//指定行键到最后
//		Scan scan=new Scan(startRow);
		//指定范围---(s999990,s999999]
		Scan scan=new Scan("s999990".getBytes(), "s999999".getBytes());
		//获取扫描器
		ResultScanner rs = table.getScanner(scan);
		//将ResultScanner转化为一个迭代器来遍历
		Iterator<Result> it=rs.iterator();
		while(it.hasNext()){
			Result result = it.next();
			byte[] data = result.getValue("basic".getBytes(), "id".getBytes());
			System.err.println(new String(data));
		}
		table.close();
	}
	
	//删除表
	@Test
	public void dropTable() throws Exception{
		//获取管理权
		HBaseAdmin admin=new HBaseAdmin(conf);
		//禁用表
		admin.disableTable(TableName.valueOf("student"));
		//删除表
		admin.deleteTable(TableName.valueOf("student"));
		//
		admin.close();
	}
	
	//过滤到  132223类似行键值中间有222的数据
	@Test
	public void regex() throws IOException{
		HTable table=new HTable(conf, TableName.valueOf("student"));
		//
		Scan scan=new Scan();
		//rowCompareOp--比较模式
		//rowComparator---规则
		
		//过滤
		Filter filter=new RowFilter(CompareOp.EQUAL, new RegexStringComparator(".*222.*"));
		//
		scan.setFilter(filter);
		ResultScanner rs = table.getScanner(scan);
		Iterator<Result> it = rs.iterator();
		while(it.hasNext()){
			Result result = it.next();
			byte[] data = result.getRow();
			System.err.println(new String(data));
		}
	}
}

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值