hbase中scan和get的功能以及实现的异同

最新推荐文章于 2023-11-16 09:42:00 发布

海倒過來是天。

最新推荐文章于 2023-11-16 09:42:00 发布

阅读量1.9k

点赞数 2

文章标签： hbase

本文链接：https://blog.csdn.net/qq_45263635/article/details/130157324

版权

Hbase的scan方法

Apache HBase是一种开源、分布式、版本控制的非关系型数据库系统，类似于Google的Bigtable。HBase提供了几种用于检索数据的API，其中包括scan()和get()方法。

Scan()方法是用于扫描表中的多个行的API。它可以用于从一个或多个范围内获取多行数据，以及使用过滤器来进行数据过滤和排序。Scan操作是无阻塞性的，因此它适用于大多数查询场景。通过设置缓存大小和批处理大小等参数，可以优化扫描性能。 Scan方法支持以下几种参数：

Start Row:起始行
Stop Row:终止行
Time Range:时间范围(可选)
Column Family:列族(可选)
Columns:列限制列表(必需)

使用 Scan 方法时，必须指定要检索的列。如果未指定列，则将返回所有列。Scan 方法返回一个 ResultScanner 对象，该对象包含了从 HBase 表检索到的 Result 对象列表。Result 代表单个行的结果，并包含了该行的详细信息，包括行键和列值。

以下是一个示例代码段，说明如何使用Java API进行Scan操作：

Configuration config = HBaseConfiguration.create();
HTable table = new HTable(config, "table-name");
Scan scan = new Scan();

//设置开始行和结束行
scan.setStartRow(Bytes.toBytes("start-row"));
scan.setStopRow(Bytes.toBytes("stop-row"));

//添加返回列
scan.addColumn(Bytes.toBytes("column-family"), Bytes.toBytes("column-qualifier"));

ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    //遍历结果并进行处理
}

scanner.close();
table.close();

在这个示例中，首先创建了一个 Configuration 对象作为 HBase 连接参数，然后创建了一个 HTable 对象来表示要扫描的表。然后，我们使用Scan实例设置起始行和终止行，并添加了要返回的列。最后我们通过遍历ResultScanner来检索结果，并完成必要的处理。

需要注意的是， Scan 方法是基于列族和列名称来检索数据的，而不是基于列值。对于基于列值的检索，可以使用HBase中的 Filter。

Hbase的get方法

Get()方法用于检索单个行。它接受一个RowKey作为参数，并检索对应的行。当需要检索特定行时，使用get()方法会比scan()方法更有效率。Get()操作不支持服务器端过滤器，并且只返回单个行，因此它很快并且适合于根据RowKey查找记录。

HBase的Get方法是一种基于行键的检索数据的操作，它通过读取单个行来获取特定的列族、列限定符和时间戳信息。Get操作是用于读取单个行数据的最基本的API之一。
使用HBase的Get方法时，必须指定要检索的行键，可以通过检索指定的列族或限定符进一步规定检索范围，还可以设置时间戳以返回特定时间范围的值。Get方法返回一个Result对象，其中包含了从HBase表中检索到的行的详细信息。
以下是一个示例代码段，说明如何使用Java API进行Get操作：

Configuration config = HBaseConfiguration.create();
HTable table = new HTable(config, "table-name");
Get get = new Get(Bytes.toBytes("row-key"));

//添加返回的列族和列限定符
get.addColumn(Bytes.toBytes("column-family"), Bytes.toBytes("column-qualifier"));

//指定最大版本数和时间戳范围
get.setMaxVersions(5);
get.setTimeRange(startTimeStamp, endTimeStamp);

Result result = table.get(get);

在这个示例中，我们首先创建了一个 Configuration 对象作为 HBase 连接参数，然后创建了一个 HTable 对象来表示要扫描的表。我们通过Get方法指定了要检索的行键和要返回的列族和列限定符，并指定了可选的最大版本数和时间戳范围。最后我们通过调用table.get(get)方法获取特定行数据的Result对象。

需要注意的是， HBase的Get方法只能用于读取单个行数据，如果需要获取多个行或者进行范围查询，需要使用HBase中的Scan方法。

使用案例

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellScanner;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseDemo {

    private static Configuration conf = null;
    private static Connection conn = null;

    static {
        conf = HBaseConfiguration.create();
        // 配置Zookeeper地址
        conf.set("hbase.zookeeper.quorum", "localhost:2181");
        try {
            conn = ConnectionFactory.createConnection(conf);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws IOException {
        // 获取要操作的表名
        TableName tableName = TableName.valueOf("testTable");

        // 1. 获取表对象
        Table table = conn.getTable(tableName);

        // 2. 单行检索
        Get get = new Get(Bytes.toBytes("row1"));
        Result result = table.get(get);
        CellScanner cellScanner = result.cellScanner();
        while (cellScanner.advance()) {
            Cell currentCell = cellScanner.current();
            // 获取列族名称
            byte[] family = currentCell.getFamilyArray();
            int familyOffset = currentCell.getFamilyOffset();
            int familyLength = currentCell.getFamilyLength();
            String columnFamily = Bytes.toString(family, familyOffset, familyLength);
            // 获取列名称
            byte[] qualifier = currentCell.getQualifierArray();
            int qualifierOffset = currentCell.getQualifierOffset();
            int qualifierLength = currentCell.getQualifierLength();
            String columnName = Bytes.toString(qualifier, qualifierOffset, qualifierLength);
            // 获取值
            byte[] value = currentCell.getValueArray();
            int valueOffset = currentCell.getValueOffset();
            int valueLength = currentCell.getValueLength();
            String columnValue = Bytes.toString(value, valueOffset, valueLength);
            // 打印结果
            System.out.println("row1, " + columnFamily + ":" + columnName + ", " + columnValue);
        }

        // 3. 表扫描
        Scan scan = new Scan(Bytes.toBytes("row1"), Bytes.toBytes("row3"));
        ResultScanner resultScanner = table.getScanner(scan);
        for (Result rs : resultScanner) {
            CellScanner scanner = rs.cellScanner();
            while (scanner.advance()) {
                Cell cell = scanner.current();
                byte[] rowArray = cell.getRowArray();
                int rowOffset = cell.getRowOffset();
                int rowLength = cell.getRowLength();
                String rowKey = Bytes.toString(rowArray, rowOffset, rowLength);
                // 获取列族名称
                byte[] family = cell.getFamilyArray();
                int familyOffset = cell.getFamilyOffset();
                int familyLength = cell.getFamilyLength();
                String columnFamily = Bytes.toString(family, familyOffset, familyLength);
                // 获取列名称
                byte[] qualifier = cell.getQualifierArray();
                int qualifierOffset = cell.getQualifierOffset();
                int qualifierLength = cell.getQualifierLength();
                String columnName = Bytes.toString(qualifier, qualifierOffset, qualifierLength);
                // 获取值
                byte[] value = cell.getValueArray();
                int valueOffset = cell.getValueOffset();
                int valueLength = cell.getValueLength();
                String columnValue = Bytes.toString(value, valueOffset, valueLength);
                // 打印结果
                System.out.println(rowKey + ", " + columnFamily + ":" + columnName + ", " + columnValue);
            }
        }

        // 4. 关闭资源
        resultScanner.close();
        table.close();
        conn.close();
    }

}

总结

这两种方法的共同点在于，它们都用于从HBase中读取数据，并且可以使用过滤器来指定数据选择条件，从而返回所需的数据。它们的不同之处在于，Get()方法用于从表中读取单个行，而Scan()方法用于从表中读取多个行。当需要批量检索数据并需要对结果进行操作时，应使用Scan()方法。当需要快速检索单个行时，应使用Get()方法。

另外，可以注意到，get()方法具有更高的性能和更低的延迟，但只能检索单个行。scan()方法可以扫描多个行，但性能略低于get()。因此，在实际应用中，应根据需求选择使用适当的方法。

海倒過來是天。

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
hbase中scan和get的功能以及实现的异同

这两种方法的共同点在于，它们都用于从HBase中读取数据，并且可以使用过滤器来指定数据选择条件，从而返回所需的数据。它们的不同之处在于，Get()方法用于从表中读取单个行，而Scan()方法用于从表中读取多个行。当需要批量检索数据并需要对结果进行操作时，应使用Scan()方法。当需要快速检索单个行时，应使用Get()方法。另外，可以注意到，get()方法具有更高的性能和更低的延迟，但只能检索单个行。scan()方法可以扫描多个行，但性能略低于get()。因此，在实际应用中，应根据需求选择使用适当的方法。
复制链接

扫一扫