Hbase API基础知识

最新推荐文章于 2023-05-28 23:47:50 发布

坚持到底cw

最新推荐文章于 2023-05-28 23:47:50 发布

阅读量1.9k

点赞数

分类专栏： Hbase API应用文章标签： hbase API

本文链接：https://blog.csdn.net/chenwei825825/article/details/17034019

版权

Hbase API应用专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1. CRUD操作：

HTable类

Put类,Get类,Delete类

2. 原子性操作：compare-and-put，compare-and-delete

3. KeyValue类：KeyValue[] kv = res1.raw();

4. Result类：Result res1 = table.get(get);

5. 写缓冲区：每一个put操作实际上都是一个RPC操作，它将客户端数据传送到服务器然后返回，这只适合小数据量操作，如果有个程序需要每秒存储上千行数据到HBase表中，这样的处理就不太合适了。（减少独立RPC调用的关键是限制往返时间，往返时间就是客户端发送一个请求到服务器，然后服务器通过网络进行响应的时间。）HBase的API提供了写缓存区，它负责收集put操作，然后调用RPC一次性将put送往服务器。

默认客户端缓冲区是禁止的，可以通过设置自动刷写为false来激活写缓冲区。

table.setAutoFlush(false) ;table.flushCommits();

一旦超过缓冲指定大小，客户端会隐式地调用刷写命令。用户可以配置客户端缓冲区大小：setWriteBufferSize(long size),默认是2MB（即2097152字节）

也可以在hbase-site.xml中设置：<name>hbase.client.write.buffer</name> <value>20971520</value>

6. 批量处理：List<Row> batch =new ArrayList<Row>();

Put put = new Put(ROW2);

put.add(COLFAM2 , QUAL1 , Bytes.toBytes("value1"));

batch.add(put);

Get get1 = new Get(ROW1);

get1.addColumn(COLFAM1, QUAL2);

batch.add(get1);

Delete delete = new Delete(ROW1);

delete.addColumn(COLFAM1 , QUAL2);

batch.add(delete);

Object[] result = new Object[batch.size()];

table.batch(batch,result);

当使用batch()时，put实例不会被客户端写入缓冲区缓冲，batch（）请求是同步的，会把操作直接发送到服务器端。

7. 查询多个版本：scan ‘test’,{VERSIONS=>3}

8. 行锁：尽量不要使用

9. 扫描Scan：可以选择性的提供startRow参数，stopRow参数。，可选过滤器参数，可以用多种方法限制要读取的数据：addFamily() , addCoumn().

扫描操作不会通过一次RPC请求返回所有匹配的行，而是以行为单位进行返回，如果行的数量很大，在一次请求中发送大量的数据，会占用大量的系统资源并消耗很长的时间。

ResultScanner把扫描操作转换为类似的get操作，ResultScanner类的一些方法：Result next（），Result[] next(int nRows) , void close（）。要确保尽早释放扫描器实例，，一个打开的扫描器会占用不少资源和堆空间，当使用完ResultScanner之后应该调用close（）。

就像行锁一样，扫描器也有租约超时机制，保护其不被失效的客户端阻塞太久。配置为下：单位毫秒<name>hbase.regionserver.lease.period</name> <value>120000</value>

10. 扫描器缓存（表层面+扫描层面）+批量

每个next（）都会为每行数据生成一个单独的RPC请求，即使使用next（int nRows）也是如此，它仅仅是循环地调用next（）。扫描缓存实现：一次RPC请求可以获取多行数据。

用户可以使用HTable方法设置表级的扫描缓存：setScannerCaching(int scannercaching)

用户也可以修改整个集群的默认值1，<name>hbase.client.scanner.caching</name> <value>10</value>

扫描级的优先级最高，可以使用Scan类的方法设置扫描级缓存：setCaching(int caching)

上面介绍的是客户端如何使用扫描缓存来从远程region服务器向客户端整批传递数据。如果数据量非常大的行，这些行可能超过客户端进程的内存容量，解决办法是：批量。

void setBatch(int batch),缓存是面向行一级的操作，批量是面向列一级的操作，批量可以让用户选择每一次ResultScanner实例的next（）操作要取回多少列。。组合使用扫描缓存与批量，可以方便的控制扫描一个范围内的行键时所需要RPC调用次数。

package net.hadoop.hbase.study1;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Row;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

public class Demo001 {
public static void main(String[] args) throws IOException, InterruptedException{
Hbase hbase = new Hbase("testtable3");
/*hbase.setput("3100");
hbase.put("info", "name", "cw");
hbase.put("info", "age", "22");
hbase.put("info", "add", "hubei");
hbase.run();
hbase.setput("3101");
hbase.put("info", "name", "ad");
hbase.put("info", "age", "23");
hbase.put("info", "add", "bj");
hbase.run();
hbase.setput("3102");
hbase.put("info", "name", "llj");
hbase.put("info", "age", "21");
hbase.put("info", "add", "nj");
hbase.run();
hbase.cleanput();*/
/*hbase.setget("3100");
hbase.get("info","age");*/
//hbase.put("info" , "name" ,"dd");
hbase.scan();

}
/*
* 批处理操作
*/
public static void pichuli() throws IOException, InterruptedException{
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "testtable3");
List<Row> batch = new ArrayList<Row>();

Put put = new Put(Bytes.toBytes("3108"));
put.add(Bytes.toBytes("info"), Bytes.toBytes("name"),Bytes.toBytes("wd"));
batch.add(put);

Get get1 = new Get(Bytes.toBytes("3108"));
get1.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));
batch.add(get1);

Delete delete = new Delete(Bytes.toBytes("3100"));
delete.deleteColumns(Bytes.toBytes("info"), Bytes.toBytes("name"));
batch.add(delete);

Object[] result = new Object[batch.size()];
table.batch(batch,result);

for(int i = 0 ; i < result.length ; i++){
System.out.println("Result[" + i + "]" + result[i]);
}
}
}

class Hbase{
private Configuration conf;
private HTable table;
private Put put;
private Get get;
private Delete delete;
public Hbase(String tablename) throws IOException{
this.conf = HBaseConfiguration.create();
this.table = new HTable(conf , tablename);
table.setAutoFlush(false); //写缓冲区
}
public void setput(String rowkey){
this.put = new Put(Bytes.toBytes(rowkey));
}
public void put(String colfam,String col,String value) throws IOException{
put.add(Bytes.toBytes(colfam),Bytes.toBytes(col),Bytes.toBytes(value));

//原子操作 compare-and-put
/*Put put1 = new Put(Bytes.toBytes("3100"));
put1.add(Bytes.toBytes(colfam), Bytes.toBytes(col), Bytes.toBytes(value));
boolean bool = table.checkAndPut(Bytes.toBytes("3100"), Bytes.toBytes(colfam),
Bytes.toBytes(col), null, put1); //存在则 put
System.out.println("Put applied: " + bool);*/
}
public void run() throws IOException{
table.put(put);
}
public void cleanput() throws IOException{
table.flushCommits();
}
public void setget(String rowkey){
this.get = new Get(Bytes.toBytes(rowkey));
}
public void get(String colfam,String col) throws IOException{
Result res1 = table.get(get);
System.out.println("keyvalues:{行键：列簇：列}为：" + res1);

get.addColumn(Bytes.toBytes(colfam),Bytes.toBytes(col));
Result res2 = table.get(get);
byte[] val1 = res2.getValue(Bytes.toBytes(colfam), Bytes.toBytes(col));
System.out.println("所查询列簇为 "+colfam + ",列为 "+col + "的值为： " + Bytes.toString(val1));

byte[] val2 = res1.value();
System.out.println("第一个列的值： "+ Bytes.toString(val2));

byte[] val3 = res1.getRow();
System.out.println("当前查询的行键位： " + Bytes.toString(val3));

int size = res1.size();
System.out.println("键值对的数目为： " + size);

boolean b = res1.isEmpty();
System.out.println("键值对数目为0吗？：" + b);

KeyValue[] kv = res1.raw();
System.out.println("底层keyvalue的数据结构： " + kv.toString());

for(KeyValue k : kv){
System.out.println("col: "+Bytes.toString(k.getFamily()) +
"/" + Bytes.toString(k.getQualifier()) +
",value: " + Bytes.toString(k.getValue()));
}
}
public void delete(String rowkey,String colfam,String col) throws IOException{
delete = new Delete(Bytes.toBytes("rowkey"));
delete.setTimestamp(1);
delete.deleteColumn(Bytes.toBytes(colfam), Bytes.toBytes(col), 5);//删除给定列的版本和所用更旧版本
delete.deleteColumns(Bytes.toBytes(colfam), Bytes.toBytes(col));
delete.deleteFamily(Bytes.toBytes(colfam));
delete.deleteFamily(Bytes.toBytes(colfam),3); //删除给定列簇的版本和更旧的版本
table.delete(delete);

//原子操作 compare-and-delete
/*Delete delete1 = new Delete(Bytes.toBytes(rowkey));
delete1.deleteColumn(Bytes.toBytes(colfam), Bytes.toBytes(col));
boolean bool = table.checkAndDelete(Bytes.toBytes(rowkey),Bytes.toBytes(colfam),
Bytes.toBytes("ww"),null,delete1);//不存在则删除
System.out.println("Delete successful: "+bool);*/
}
public void scan() throws IOException{
Scan scan1 = new Scan();
/*
* scan1.setCaching(100); //设置扫描缓存
scan1.setBatch(100); //设置批量处理
*/
ResultScanner scanner1 = table.getScanner(scan1);
for(Result res:scanner1){
System.out.println(res);
}
scanner1.close();

Scan scan2 = new Scan();
scan2.addFamily(Bytes.toBytes("info"));
ResultScanner scanner2 = table.getScanner(scan2);
for(Result res:scanner2)
System.out.println(res);
scanner2.close();

Scan scan3 = new Scan();
scan3.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name")).
addColumn(Bytes.toBytes("info"), Bytes.toBytes("age")).
setStartRow(Bytes.toBytes("3100")).
setStopRow(Bytes.toBytes("3108"));

ResultScanner scanner3 = table.getScanner(scan3);
for(Result res : scanner3)
System.out.println(res);
scanner3.close();

}
public void cleanup() throws IOException{
table.close();
}
}

坚持到底cw

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hbase API基础知识

1. CRUD操作：HTable类Put类,Get类,Delete类2. 原子性操作：compare-and-put，compare-and-delete3. KeyValue类：KeyValue[] kv = res1.raw();4. Result类：Result res1 = table.get(get);5. 写缓冲区：table.setAutoFlush(fa
复制链接

扫一扫