HBase 实战案例之使用Scanner获取数据
1.Java API 简介
1.1 getScanner()
getScanner
方法有三个重载模型,分别如下:
getScanner(Scan scan)
/**
* Returns a scanner on the current table as specified by the {@link Scan}
* object.
* 返回当前表上由Scan对象指定的一个scanner
*
* Note that the passed {@link Scan}'s start row and caching properties
* maybe changed.
*注意:传递的Scan的起始行以及缓冲参数可能会被改变【这是什么意思?】
* @param scan A configured {@link Scan} object.
* @return A scanner.
* @throws IOException if a remote or network exception occurs.
* @since 0.20.0
*/
ResultScanner getScanner(Scan scan) throws IOException;
getScanner(byte[] family)
/**
* Gets a scanner on the current table for the given family.
* 在当前的表,以及指定的列族上获取一个scanner(扫描器)
* @param family The column family to scan.
* @return A scanner.
* @throws IOException if a remote or network exception occurs.
* @since 0.20.0
*/
ResultScanner getScanner(byte[] family) throws IOException;
getScanner(byte[] family, byte[] qualifier)
/**
* Gets a scanner on the current table for the given family and qualifier.
* 返回一个当前表中给定的列族和限定符所表示的scanner
*
* @param family The column family to scan.
* @param qualifier The column qualifier to scan.
* @return A scanner.
* @throws IOException if a remote or network exception occurs.
* @since 0.20.0
*/
ResultScanner getScanner(byte[] family, byte[] qualifier) throws IOException;
2.实战代码
2.1 分别针对上述api,进行测试。在测试之前,请看tsdb-uid
表中的数据,如下:
\x00 column=id:metrics, timestamp=1541500656882, value=\x00\x00\x00\x00\x00\x00\x00\x05
\x00 column=id:tagk, timestamp=1535982247222, value=\x00\x00\x00\x00\x00\x00\x00\x03
\x00 column=id:tagv, timestamp=1541425665699, value=\x00\x00\x00\x00\x00\x00\x00\x08
\x00\x00\x01 column=name:metrics, timestamp=1531479245132, value=mytest.cpu
\x00\x00\x01 column=name:tagk, timestamp=1531479245162, value=host
\x00\x00\x01 column=name:tagv, timestamp=1531479245189, value=server4
\x00\x00\x02 column=name:metrics, timestamp=1535891521172, value=metric-t
\x00\x00\x02 column=name:tagk, timestamp=1535891521198, value=chl
\x00\x00\x02 column=name:tagv, timestamp=1531479264404, value=server5
\x00\x00\x03 column=name:metrics, timestamp=1535982247205, value=csdn
\x00\x00\x03 column=name:tagk, timestamp=1535982247230, value=accessNumber
\x00\x00\x03 column=name:tagv, timestamp=1531485413194, value=s485276
\x00\x00\x04 column=name:metrics, timestamp=1541426336083, value=test
\x00\x00\x04 column=name:tagv, timestamp=1535891521217, value=hqdApp
\x00\x00\x05 column=name:metrics, timestamp=1541500656917, value=test_meta
\x00\x00\x05 column=name:tagv, timestamp=1535982247253, value=cs
\x00\x00\x06 column=name:tagv, timestamp=1537103490275, value=Firminal
\x00\x00\x07 column=name:tagv, timestamp=1541425665353, value=lawson
\x00\x00\x08 column=name:tagv, timestamp=1541425665725, value=firminal
Firminal column=id:tagv, timestamp=1537103490289, value=\x00\x00\x06
accessNumber column=id:tagk, timestamp=1535982247235, value=\x00\x00\x03
chl column=id:tagk, timestamp=1535891521203, value=\x00\x00\x02
cs column=id:tagv, timestamp=1535982247259, value=\x00\x00\x05
csdn column=id:metrics, timestamp=1535982247213, value=\x00\x00\x03
firminal column=id:tagv, timestamp=1541425665756, value=\x00\x00\x08
host column=id:tagk, timestamp=1531479245177, value=\x00\x00\x01
hqdApp column=id:tagv, timestamp=1535891521224, value=\x00\x00\x04
lawson column=id:tagv, timestamp=1541425665366, value=\x00\x00\x07
metric-t column=id:metrics, timestamp=1535891521182, value=\x00\x00\x02
mytest.cpu column=id:metrics, timestamp=1531479245145, value=\x00\x00\x01
s485276 column=id:tagv, timestamp=1531485413204, value=\x00\x00\x03
server4 column=id:tagv, timestamp=1531479245192, value=\x00\x00\x01
server5 column=id:tagv, timestamp=1531479264407, value=\x00\x00\x02
test column=id:metrics, timestamp=1541426336086, value=\x00\x00\x04
test_meta column=id:metrics, timestamp=1541500656927, value=\x00\x00\x05
25 row(s) in 0.7650 seconds
- 使用
columnFamily
作为参数
public static void getRowByScan(String tableName, String columnFamily) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
ResultScanner resultScanner = table.getScanner(Bytes.toBytes(columnFamily));// get cf's data
for(Result res: resultScanner){
System.out.println(res);
}
} catch (IOException e) {
e.printStackTrace();
}
}
执行结果如下:
keyvalues={\x00\x00\x01/name:metrics/1531479245132/Put/vlen=10/seqid=0, \x00\x00\x01/name:tagk/1531479245162/Put/vlen=4/seqid=0, \x00\x00\x01/name:tagv/1531479245189/Put/vlen=7/seqid=0}
keyvalues={\x00\x00\x02/name:metrics/1535891521172/Put/vlen=8/seqid=0, \x00\x00\x02/name:tagk/1535891521198/Put/vlen=3/seqid=0, \x00\x00\x02/name:tagv/1531479264404/Put/vlen=7/seqid=0}
keyvalues={\x00\x00\x03/name:metrics/1535982247205/Put/vlen=4/seqid=0, \x00\x00\x03/name:tagk/1535982247230/Put/vlen=12/seqid=0, \x00\x00\x03/name:tagv/1531485413194/Put/vlen=7/seqid=0}
keyvalues={\x00\x00\x04/name:metrics/1541426336083/Put/vlen=4/seqid=0, \x00\x00\x04/name:tagv/1535891521217/Put/vlen=6/seqid=0}
keyvalues={\x00\x00\x05/name:metrics/1541500656917/Put/vlen=9/seqid=0, \x00\x00\x05/name:tagv/1535982247253/Put/vlen=2/seqid=0}
keyvalues={\x00\x00\x06/name:tagv/1537103490275/Put/vlen=8/seqid=0}
keyvalues={\x00\x00\x07/name:tagv/1541425665353/Put/vlen=6/seqid=0}
keyvalues={\x00\x00\x08/name:tagv/1541425665725/Put/vlen=8/seqid=0}
可以看到代码中的一个res其实是一个 Keyvalues
,因为同行中的数据不等,于是得到的总数据就是8行。
- 使用Scan作为参数
public static void getRowByScan(String tableName) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
scan.setStartRow("server4".getBytes());
ResultScanner resultScanner = table.getScanner(scan);// get cf's data
for(Result res: resultScanner){
System.out.println(res);
}
} catch (IOException e) {
e.printStackTrace();
}
}
执行结果如下:
keyvalues={server4/id:tagv/1531479245192/Put/vlen=3/seqid=0}
keyvalues={server5/id:tagv/1531479264407/Put/vlen=3/seqid=0}
keyvalues={test/id:metrics/1541426336086/Put/vlen=3/seqid=0}
keyvalues={test_meta/id:metrics/1541500656927/Put/vlen=3/seqid=0}
- 使用
columnFamily,qualifier
作为参数
public static void getRowByScanThree(String tableName,String family,String qualifier) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
ResultScanner resultScanner = table.getScanner(family.getBytes(),qualifier.getBytes());// get cf's data
for(Result res: resultScanner){
System.out.println(res);
}
} catch (IOException e) {
e.printStackTrace();
}
}
执行结果如下:
keyvalues={\x00\x00\x01/name:metrics/1531479245132/Put/vlen=10/seqid=0}
keyvalues={\x00\x00\x02/name:metrics/1535891521172/Put/vlen=8/seqid=0}
keyvalues={\x00\x00\x03/name:metrics/1535982247205/Put/vlen=4/seqid=0}
keyvalues={\x00\x00\x04/name:metrics/1541426336083/Put/vlen=4/seqid=0}
keyvalues={\x00\x00\x05/name:metrics/1541500656917/Put/vlen=9/seqid=0}
2.2 输出 Keyvalue
的值
上面的输出将表中一整行的数据作为一个 Keyvalue
对象存储,但是如何单独取出 Keyvalue
中的值呢?比如说,我想取出rowKey=? value=? timestamp=?
等。代码如下:
public static void getRowValue(String tableName,String family,String qualifier) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
ResultScanner resultScanner = table.getScanner(family.getBytes(),qualifier.getBytes());// get cf's data
for(Result res: resultScanner){
//System.out.println(res);
for (KeyValue kv : res.raw()) {
byte []temp = new byte[]{};
temp = kv.getRow();
System.out.print("rowKey: ");
for(int i = 0;i<temp.length;i++){
System.out.print(temp[i]);
}
System.out.println(" value: "+Bytes.toString(kv.getValue()) +" timestamp: "+(kv.getTimestamp()));
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
执行结果
rowKey: 001, value: mytest.cpu, timestamp: 1531479245132
rowKey: 002, value: metric-t, timestamp: 1535891521172
rowKey: 003, value: csdn, timestamp: 1535982247205
rowKey: 004, value: test, timestamp: 1541426336083
rowKey: 005, value: test_meta, timestamp: 1541500656917
因为在表tsdb-uid
的 rowKey
是一个字节数组,所以无法将其直接转为String
,于是在上面的代码里,使用的是for()
循环输出rowKey
。