http://hbase.apache.org/
HBase认识
定位:
- Hadoop database
- a distributed, scalable, big data store 分布式的,可扩展的数据库存储
- random realtime read/write 想要做随机读写或者实时读写,更适合用Hbase
- very large tables billions of rows X millions of columns 目标存大的宽表,数十亿行百万列,当然生产中就用到几百个列
可以看出是数据库,不过HBase没有SQL,HBase Shell基本不用,使用很不便利,用phoenix,另外底层存储是落在HDFS
这时候就可以考量了,Hadoop里面有namenode,有datanode,那么Hbase部署的时候是否和namenode部署在同一个节点,我datanode有多少个,我Hbase就部署多少个节点,因为Hbase最终落盘是到HDFS
特性
- Linear and modular scalability. 线型可扩展的
- Strictly consistent reads and writes. 强一致性读写
- Automatic and configurable sharding of tables Automatic failover support between RegionServers.自动配置分配表、包括支持failover
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
- Easy to use Java API for client access. 生产上一般不用Java API,因为我们用phoenix来处理,那么就转换为Spark,Spark SQL就可以
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options Extensible jruby-based (JIRB) shell Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX push down就像之前的Hive,做过滤时位次下推
对比关系型数据库
- 1.动态列
- 2.存储数据形式(非结构化的)
- 3.事务(大数据并不过多考虑),是一个单表操作,Saprk加载进来宽表也可以做聚合关联
- 4.分布式:可扩展的
- 5.支持的列多
- 6.kv存储
- 7.表之间的耦合性低
- 8.存在数据冗余(HBase存在多个版本,要保存三个版本的话就只要把版本设置成三)
- 9.大数据量查询性能比较高(需要什么列就会加载什么列)
- 10.存储性能也比关系型数据库性能高,null不存的,HBase动态列
- 11.本身二级索引不支持(检索方式:rowkey、scan: 全表、范围),后续可以自己去做 这个用phoenix
- 12.trigger不支持
- 13.字段的支持格式单一(bytes数组)
HBase Shell
help
create
scan/get(eg: hbase> get 'ns1:t1', 'r1')......
put 'ns1:t123123' 'r1','f1:c1','123'
//表名(命名空间:表名) 行名 列名(列簇:列名) value
describe 'ns1:t123123'
//有enabled 也有disenabled,enabled的话就不能进行drop操作
TTL时效,也可以做临时的,生产压缩采用snappy
需要先disable 'ns1:t123123' 才能drop 'ns1:t123123'
scan 'ns1:t123123' 扫描表
//需要看表的话需要是enabled状态
get 't1', 'r1' 获取某张表的某行值
HBase Conf
1.安装后存在的文件:
- regionservers
- hbase-site.xml…
2.配置在安装后文件中的conf
静态配置/动态配置(静态配置修改后需要重启才能生效)
**** 此处为生产中做过调整的参数 ****
hbase.client.operation.timeout
phoenix.query.timeoutMs
phoenix.query.keepAliveMs
hbase.client.ipc.pool.size
dfs.socket.timeout
dfs.client.socket-timeout
hbase.client.operation.timeout
hbase.master.namespace.init.timeout
hbase.regionserver.executor.openregion.threads
index.builder.threads.max..
zk retry timeout ulimit flush grant compact split...
WAL files预写日志
update_config , update_all_config
动态配置
HBase 基本术语
HBase是通过Row Key去检索的,相当于mysql里的主键
Time Stamp有多个说明有多个版本
contents:html = “…?” 这里就是列簇:列名
Table => N * Row(Rk) N是由Rk决定
Row => rowkey(RK) [一条数据的唯一标识]
column family(cf): 一行数据同属于一个cf
table => N * cf , cf => n * col,
一个表可以有N多个列簇,一个列簇又可以包含N多个列(每一行列不一定一致,动态列) 多行数据可以有不同的col
namespace可以理解为数据库的DB库
如何定位:cell: rk, cf:col, version
ttl:time to live 数据存在的时间长短,默认是forever
eg:row => rw1 column=cf1:q1, timestamp=1550351947916, value=val1
column family列簇,生产上不要超过三个,可以做业务区分
cf => 基本信息
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1',
IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true',
BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
master
regionserver
region
wal
memstore
lrucache
compact
split
flush
HBase 搭建 + Demo(Java API / REST API)
版本(注意)冲突: hadoop-2.6.0-cdh5.12.0 || hbase-1.2.0-cdh5.12.0 兼容 (apache版本的是否冲突要注意)
http://hbase.apache.org/book.html#_rest REST API
API:
RowName: rw1
Column Family: cf1
Column Name: q1
Cell Value: val1
Timestamp: 1550351947916
HBaseApiDemo
package com.jh;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class HBaseApiDemo {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;
public static void main(String[] args) throws IOException {
createTable("t4", new String[]{"cf1", "cf2"});
// insterRow("t4", "rw1", "cf1", "q1", "val1");
getData("t4", "rw1", "cf1", "q1");
// scanData("t3", "", "");
// deleRow("t3", "rw1", "cf1", "q1");
// deleteTable("t3");
}
// 初始化链接
public static void init() {
configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.quorum", "hadoop00");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.set("hbase.rootdir", "hdfs://hadoop00:8020/hbase");
try {
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
} catch (IOException e) {
e.printStackTrace();
}
}
// 建表
public static void createTable(String tableNmae, String[] cols) throws IOException {
init();
TableName tableName = TableName.valueOf(tableNmae);
if (admin.tableExists(tableName)) {
System.out.println("talbe is exists!");
} else {
HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
for (String col : cols) {
HColumnDescriptor hColumnDescriptor = new HColumnDescriptor(col);
hTableDescriptor.addFamily(hColumnDescriptor);
}
admin.createTable(hTableDescriptor);
}
close();
}
// 删表
public static void deleteTable(String tableName) throws IOException {
init();
TableName tn = TableName.valueOf(tableName);
if (admin.tableExists(tn)) {
admin.disableTable(tn);
admin.deleteTable(tn);
}
close();
}
// 查看已有表
public static void listTables() throws IOException {
init();
HTableDescriptor hTableDescriptors[] = admin.listTables();
for (HTableDescriptor hTableDescriptor : hTableDescriptors) {
System.out.println(hTableDescriptor.getNameAsString());
}
close();
}
// 插入数据
public static void insterRow(String tableName, String rowkey, String colFamily, String col, String val) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(rowkey));
put.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col), Bytes.toBytes(val));
table.put(put);
// 批量插入
/*
List<Put> putList = new ArrayList<Put>();
puts.add(put);
table.put(putList);*/
table.close();
close();
}
// 删除数据
public static void deleRow(String tableName, String rowkey, String colFamily, String col) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(rowkey));
// 删除指定列族
// delete.addFamily(Bytes.toBytes(colFamily));
// 删除指定列
// delete.addColumn(Bytes.toBytes(colFamily),Bytes.toBytes(col));
table.delete(delete);
// 批量删除
/* List<Delete> deleteList = new ArrayList<Delete>();
deleteList.add(delete);
table.delete(deleteList);*/
table.close();
close();
}
// 根据rowkey查找数据
public static void getData(String tableName, String rowkey, String colFamily, String col) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(Bytes.toBytes(rowkey));
//获取指定列族数据
// get.addFamily(Bytes.toBytes(colFamily));
// 获取指定列数据
// get.addColumn(Bytes.toBytes(colFamily),Bytes.toBytes(col));
Result result = table.get(get);
showCell(result);
table.close();
close();
}
// 格式化输出
public static void showCell(Result result) {
Cell[] cells = result.rawCells();
for (Cell cell : cells) {
System.out.println("RowName: " + new String(CellUtil.cloneRow(cell)) + " ");
System.out.println("Column Family: " + new String(CellUtil.cloneFamily(cell)) + " ");
System.out.println("Column Name: " + new String(CellUtil.cloneQualifier(cell)) + " ");
System.out.println("Cell Value: " + new String(CellUtil.cloneValue(cell)) + " ");
System.out.println("Timestamp: " + cell.getTimestamp() + " ");
}
}
// 批量查找数据
public static void scanData(String tableName, String startRow, String stopRow) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
// scan.setStartRow(Bytes.toBytes(startRow));
// scan.setStopRow(Bytes.toBytes(stopRow));
ResultScanner resultScanner = table.getScanner(scan);
for (Result result : resultScanner) {
showCell(result);
}
table.close();
close();
}
// 关闭连接
public static void close() {
try {
if (null != admin) {
admin.close();
}
if (null != connection) {
connection.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
HBaseRestApiDemo
package com.jh;
import com.cloudera.org.apache.http.HttpEntity;
import com.cloudera.org.apache.http.HttpResponse;
import com.cloudera.org.apache.http.client.methods.HttpGet;
import com.cloudera.org.apache.http.impl.client.DefaultHttpClient;
import com.cloudera.org.apache.http.util.EntityUtils;
public class HBaseRestApiDemo {
public static void main(String[] args) {
// text/xml || application/json
String headerInfo = "application/json";
String result = "";
result = getTableList(headerInfo);
// result = getSchemaInfo("t2", headerInfo);
System.out.println("=== " + result);
}
// 显示所有表
public static String getTableList(String acceptInfo) {
String uriAPI = "http://hadoop00:9999/";
String result = "";
HttpGet requst = new HttpGet(uriAPI);
try {
requst.setHeader("Content-Type", "text/xml");
requst.setHeader("accept", acceptInfo);
HttpResponse response = new DefaultHttpClient().execute(requst); // 执行请求
// 其中HttpGet是HttpUriRequst的子类
int statusCode = response.getStatusLine().getStatusCode();
if (statusCode == 200 || statusCode == 403) {
HttpEntity httpEntity = response.getEntity();
result = EntityUtils.toString(httpEntity);// 取出应答字符串
// 一般来说都要删除多余的字符
// 去掉返回结果中的"\r"字符,否则会在结果字符串后面显示一个小方格
// result.replaceAll("\r", "");
} else {
requst.abort();
result = "异常的返回码:" + statusCode;
}
} catch (Exception e) {
e.printStackTrace();
result = e.getMessage().toString();
}
return result;
}
// 获取表信息
public static String getSchemaInfo(String tableName, String acceptInfo) {
String uriAPI = "http://hadoop00:9999/" + tableName + "/schema";
String result = "";
HttpGet requst = new HttpGet(uriAPI);
try {
requst.setHeader("Content-Type", "text/xml");
requst.setHeader("accept", acceptInfo);
HttpResponse response = new DefaultHttpClient().execute(requst);
// 其中HttpGet是HttpUriRequst的子类
int statusCode = response.getStatusLine().getStatusCode();
if (statusCode == 200 || statusCode == 403) {
HttpEntity httpEntity = response.getEntity();
result = EntityUtils.toString(httpEntity);// 取出应答字符串
} else {
requst.abort();
result = "异常的返回码:" + statusCode;
}
} catch (Exception e) {
e.printStackTrace();
result = e.getMessage().toString();
}
return result;
}
}