HBase详细解读之一

最新推荐文章于 2024-05-06 21:15:28 发布

鸿儒之观

最新推荐文章于 2024-05-06 21:15:28 发布

阅读量148

点赞数

分类专栏： Hbase

本文链接：https://blog.csdn.net/zhijunming/article/details/108155108

版权

Hbase 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

1、HBase简介

1、Hbase是什么?
	Hbase是分布式存储海量数据的Nosql数据库
2、Hbase的应用场景: 实时 
3、数据模型
	1、Table: 数据存储形式
	2、Region: table的一个分段，region保存在regionserver上面
	3、Store: store的个数与列簇的个数一致
	4、rowkey： hbase的数据的主键。
		数据在hbase中是有序的，按照rowkey的字典序进行排序
	5、列簇: hbase的表结构的一部分[相当于mysql的字段]
	6、列限定符: hbase数据的一部分
	7、Column: 列簇:列限定符
	8、Cell: rowkey+列簇+列限定符+时间戳
	9、Namespace: 命名空间，相当于mysql的库
4、架构
	1、Master
		职责:
			1、负责监控Regionserver，一旦regionserver宕机，会迁移数据到其他活着的regionserver上面
			2、负责表结构的变更
	2、RegionServer
		职责:
			1、负责数据的操作[增删改查]
			2、负责region的split、compact
		Region: 表的一个分段
			Store: store的个数与列簇的个数一致
				memstore: 是一块内存区域，数据在写入hbase的时候是写入memstore中，memstore达到一定的阈值之后会进行flush，每次flush都会生成一个storeFile
				storeFile: memstore flush生成
					storeFile最终是以Hfile这种文件格式保存在HDFS
		HLOG: 每个regionserver只有一个HLOG，数据在写入memstore之前会先写入HLOG，当memstore进行flush的时候，HLOG会删除预写日志中该memstore的数据
	3、zookeeper: 保存元数据的位置，master监听regionserver也是借助zookeeper

2、Hbase安装以及shell使用

1、Namespace相关
	1、创建Namespace: create_namespace '命名空间名称'
	2、删除Namespace: drop_namespace '命名空间名称'
		删除命名空间的时候，必须先删除命名空间下所有表
	3、查看命名空间下所有表: list_namespace_tables '命名空间名称'
	4、查看所有的命名空间: list_namespace
2、Table相关
	1、创建表: create '表名','列簇名1',...
	2、删除表:
		1、禁用表: disable '表名'
		2、删除表: drop '表名'
	3、查看所有表: list
	4、查看表的详情信息: describe '表名'
	5、创建表的时候指定VERSION:
		create '表名',{NAME=>'列簇名',VERSIONS=>'版本数'}
3、数据相关
	1、插入数据: put '表名',rowkey,'列簇:列限定符','值'
	2、查看数据
		1、根据rowkey查询数据
			1、查询整行数据:get '表名',rowkey
			2、查询某个列簇的数据: get '表名',rowkey,'列簇'
			3、查询某个列的数据: get '表名',rowkey,'列簇:列限定符'
		2、扫描
			1、查询整表数据: scan '表名'
			2、查询某个列簇的数据: scan '表名',{COLUMNS=>'列簇'}
			3、查询某个列的数据: scan '表名',{COLUMNS=>'列簇:列限定符'}
			4、查询某几个版本的数据： scan '表名',{COLUMNS=>'列簇:列限定符',VERSIONS=>'版本数'}
	3、清空表数据: truncate '表名'
	4、统计表条数: count '表名'

3,HBase API

package com.atguigu;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.ColumnRangeFilter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

public class HBaseDemo {
    private Configuration configuration;
    private Connection connection;
    private Admin admin;
// ================================================资源的创建和释放===========================================

    /**
     * 初始化 配置信息
     */
    @Before
    public void init() {
        configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop102:2181,hadoop103:2181,hadoop104:2181");
        try {
            connection = ConnectionFactory.createConnection(configuration);
            admin = connection.getAdmin();
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    /**
     * 关闭资源
     */
    @After
    public void close() {
        if (admin != null) {
            try {
                admin.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (connection != null) {
            try {
                connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

    }

    //================================================================命名空间有关的操作=================================

    /**
     * 创建命名空间
     */
    @Test
    public void createNameSpace() throws Exception {
        // 获取HBase的连接
        final Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop102:2181,hadoop103:2181,hadoop104:2181");
        final Connection connection = ConnectionFactory.createConnection(configuration);
        // 创建Admin
        final Admin admin = connection.getAdmin();
        // 创建命名空间
        final NamespaceDescriptor bigHBase = NamespaceDescriptor.create("bigHBase5").build();
        admin.createNamespace(bigHBase);
        // 关闭连接
        admin.close();
        connection.close();
    }

    /**
     * 显示所有的命名空间
     *
     * @throws Exception
     */
    @Test
    public void listNameSpace() throws Exception {
        // 获取hbase的连接
        final Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop102:2181,hadoop103:2181,hadoop104:2181");
        final Connection connection = ConnectionFactory.createConnection(configuration);
        // 创建admin
        final Admin admin = connection.getAdmin();
        //显示所有的命名空间
        final NamespaceDescriptor[] namespaceDescriptors = admin.listNamespaceDescriptors();
        for (NamespaceDescriptor namespaceDescriptor : namespaceDescriptors) {
            System.out.println(namespaceDescriptor.getName());
        }
        // 关闭连接
        admin.close();
        connection.close();
    }

    /**
     * 查看某个命名空间所有表
     *
     * @throws Exception
     */
    @Test
    public void listNameSpaceTablesByNameSpace() throws Exception {
        // 获取HBase的连接
        final Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop102:2181,hadoop103:2181,hadoop104:2181");
        final Connection connection = ConnectionFactory.createConnection(configuration);
        // 创建admin
        final Admin admin = connection.getAdmin();
        // 获取指定命名空间下的表信息
        final List<TableDescriptor> tableDescriptors = admin.listTableDescriptorsByNamespace("bigHBase".getBytes());
        for (TableDescriptor tableDescriptor : tableDescriptors) {
            final TableName tableName = tableDescriptor.getTableName();
            System.out.println(new String(tableName.getName()));
            System.out.println(tableName.getQualifierAsString());

        }
        // 关闭资源
        admin.close();
        connection.close();
    }

    /**
     * 删除命名空间 要求该命名空间没有表,
     * 有的话 先删除所有的表,然后再删除表
     *
     * @throws IOException
     */
    @Test
    public void dropNameSpace() throws IOException {
        // 获取 命名空间所有的表
        final List<TableDescriptor> tableDescriptors = admin.listTableDescriptorsByNamespace("bigHBase5".getBytes());
        for (TableDescriptor tableDescriptor : tableDescriptors) {
            // 禁用表
            admin.disableTable(tableDescriptor.getTableName());
            // 删除表
            admin.deleteTable(tableDescriptor.getTableName());
        }
        // 删除命名空间
        admin.deleteNamespace("bigHBase5");

    }

    //=====================================对表的操作=============================================================

    /**
     * 创建表 create "命名空间:表名","列族"
     *
     * @throws IOException
     */
    @Test
    public void createTable() throws IOException {
        // 创建表
        // create 表名 列族...
        // 创建列族的描述
        final ColumnFamilyDescriptor base_ifo = ColumnFamilyDescriptorBuilder.newBuilder("base_ifo".getBytes()).build();
        final ColumnFamilyDescriptor extra_info = ColumnFamilyDescriptorBuilder.newBuilder("extra_info".getBytes()).build();
        //  创建表的描述
        final TableDescriptor build = TableDescriptorBuilder.newBuilder(TableName.valueOf("person2"))
                .setColumnFamily(base_ifo)
                .setColumnFamily(extra_info)
                .build();
        //  预分区
        final byte[][] splitKeys = {"10".getBytes(), "20".getBytes(), "30".getBytes()};
        admin.createTable(build,splitKeys);
        // 创建表
        admin.createTable(build);
        //关闭资源
    }

    /**
     * 修改列族 需要注意的是:此修改会覆盖原来的列族,所以需要把之前的列族也要重新设置一次
     *
     * @throws IOException
     */
    @Test
    public void alterTable() throws IOException {
        //  修改extra_info列族的版本数

        // 1 创建列族的描述
        final ColumnFamilyDescriptor base_ifo = ColumnFamilyDescriptorBuilder.newBuilder("base_ifo".getBytes())
                .setMinVersions(2)
                .setMaxVersions(2).build();
        final ColumnFamilyDescriptor extra_info = ColumnFamilyDescriptorBuilder.newBuilder("extra_info".getBytes())
                .setMinVersions(2)
                .setMaxVersions(2)
                .build();
        // 新增一个列族
        final ColumnFamilyDescriptor address_info = ColumnFamilyDescriptorBuilder
                .newBuilder("address_info".getBytes()).build();
        // 创建表的描述 并关联列族描述
        final TableDescriptor build = TableDescriptorBuilder.newBuilder(TableName.valueOf("bigHBase:person"))
                .setColumnFamily(base_ifo)
                .setColumnFamily(extra_info)
                .setColumnFamily(address_info)
                .build();

        // 修改表结构
        admin.modifyTable(build);

    }

    /**
     * 显示所有的表
     *
     * @throws IOException
     */
    @Test
    public void listTable() throws IOException {
        final TableName[] tableNames = admin.listTableNames();
        for (TableName tableName : tableNames) {
            System.out.println(new String(tableName.getName()));
        }
    }

    /**
     * 删除表
     *
     * @throws IOException
     */
    @Test
    public void dropTable() throws IOException {
        // 禁用表
        admin.disableTable(TableName.valueOf("bigHBase:person2"));
        // 删除表
        admin.deleteTable(TableName.valueOf("bigHBase:person2"));
    }

    /**
     * 插入数据
     *
     * @throws IOException
     */
    @Test
    public void put() throws IOException {
        // 创建Table对象
        final Table table = connection.getTable(TableName.valueOf("student"));
        //插入数据 命令空间:表名 rowKey 列族:列限定名 值
        //  设置rowKey
        final Put put = new Put("1001000".getBytes());
        // 设置列族 列限定名 值
        put.addColumn("base_ifo".getBytes(), "name".getBytes(), "李四".getBytes());
       // put.addColumn("base_ifo".getBytes(), "age".getBytes(), Bytes.toBytes(20));
        put.addColumn("base_ifo".getBytes(), "age".getBytes(), "26".getBytes());
        put.addColumn("extra_info".getBytes(), "class".getBytes(), ("2523").getBytes());
      //  put.addColumn("address_info".getBytes(), "address".getBytes(), ("shenzhen").getBytes());
        // 插入数据
        table.put(put);
        // 关闭table
        table.close();
    }

    /**
     * 批量插入数据
     *
     * @throws IOException
     */
    @Test
    public void putList() throws IOException {
        // 创建表table对象
        final Table table = connection.getTable(TableName.valueOf("person2"));
        // 声明List对象 用于存放put对象 对应一个rowKey
        ArrayList<Put> list = new ArrayList<>();
        Put put = null;
        for (int i = 11; i <= 20; i++) {
            //  设置 rowKey
            put = new Put(("100" + i).getBytes());
            // 设置列族,列限定名 值
            put.addColumn("base_ifo".getBytes(), "name".getBytes(), ("zhansan-" + i).getBytes());
            //put.addColumn("base_ifo".getBytes(), "age".getBytes(), Bytes.toBytes(20 + i));
            put.addColumn("base_ifo".getBytes(), "age".getBytes(), ((20+i)+"").getBytes());
            put.addColumn("extra_info".getBytes(), "class".getBytes(), ("2025" + i).getBytes());
           // put.addColumn("address_info".getBytes(), "address".getBytes(), ("shenzhen" + i).getBytes());
            // 插入数据
            table.put(put);
        }
        // 关闭Table对象
        table.close();

    }

    // 通过rowKey查询整行数据
    @Test
    public void getValueByRowKey() throws IOException {
        // 创建table对象
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        // 创建get对象 并设置要查询那个rowKey
        final Get get = new Get("1000".getBytes());
        // 查询数据,返回一行数
        final Result result = table.get(get);
        // 获取cell对象,每一行都有多个cell
        final List<Cell> cells = result.listCells();
        // 遍历cells 显示每个一个cell对象的值
        for (Cell cell : cells) {
            // 一个cell 由 rowKey+列族+列限定符+时间戳+值组成
            // 由rowKey+列族+列限定符+时间戳确定

            // 获取rowKey
            final String rowKey = new String(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength());
            // 获取列族
            final String family = new String(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength());
            // 获取列限定符
            final String qualifier = new String(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            // 获取时间戳
            final long timestamp = cell.getTimestamp();

            // 获取值 value 需要判断一下 值的类型
            if ("base_ifo".equals(family) && "age".equals(qualifier)) {
                int value = Bytes.toInt(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
            } else {
                String value = new String(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
            }
        }
        // 关闭table对象
        table.close();
    }

    /**
     * 根据rowKey和列族获取 cell
     *
     * @throws IOException
     */
    @Test
    public void getValueByRowKeyAndQualifier() throws IOException {
        // 创建table对象
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        // 创建get对象 并设置要查询那个rowKey
        final Get get = new Get("1001000".getBytes());
        // 设置单个列族
        get.addFamily("extra_info".getBytes());
        get.addFamily("base_ifo".getBytes());
        // 设置某个列族的某个列限定符
        get.addColumn("extra_info".getBytes(), "class".getBytes());
        // 获取结果
        final Result result = table.get(get);
        final List<Cell> cells = result.listCells();
        // 遍历cells 显示每个一个cell对象的值
        for (Cell cell : cells) {
            // 一个cell 由 rowKey+列族+列限定符+时间戳+值组成
            // 由rowKey+列族+列限定符+时间戳确定

            // 获取rowKey
            final String rowKey = new String(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength());
            // 获取列族
            final String family = new String(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength());
            // 获取列限定符
            final String qualifier = new String(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            // 获取时间戳
            final long timestamp = cell.getTimestamp();

            // 获取值 value 需要判断一下 值的类型
            if ("base_ifo".equals(family) && "age".equals(qualifier)) {
                int value = Bytes.toInt(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
            } else {
                String value = new String(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
            }
        }
        // 关闭table对象
        table.close();

    }

    /**
     * 批量查询
     *
     * @throws IOException
     */
    @Test
    public void getBatch() throws IOException {
        final Table table = connection.getTable(TableName.valueOf("person2"));

        // 声明一个list存储get
        List<Get> list = new ArrayList<>();
        Get get = null;
        for (int i = 11; i <= 20; i++) {
            // 创建get对象,并指定rowKey
            get = new Get(("100" + i).getBytes());
            get.addFamily("base_ifo".getBytes());
            list.add(get);
        }
        // 多条数据
        final Result[] results = table.get(list);
        // 遍历得到多条记录
        for (Result result : results) {
            final List<Cell> cells = result.listCells();
            for (Cell cell : cells) {
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String family = new String(CellUtil.cloneFamily(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 关闭资源
        table.close();
    }

    /**
     * 指定RowKey删除一行数据
     * @throws IOException
     */
    @Test
    public void deleteByRowKey() throws IOException {
        // 获取table对象
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        // 创建delete对象 指定删除的rowKey
        final Delete delete = new Delete("1000".getBytes());
        // 删除
        table.delete(delete);
        //关闭资源
        table.close();
    }

    /**
     * 批量删除数据
     * @throws IOException
     */
    @Test
    public void deleteBatch() throws IOException {
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        ArrayList<Delete>list = new ArrayList<>();
        Delete delete=null;
        for (int i =3; i <= 7; i++) {
            delete=new Delete(("100"+i).getBytes());
            list.add(delete);
        }
        table.delete(list);

        //
        table.close();
    }

    //==================================================scan扫描数据===================================================================

    /**
     * 使用scan全局扫描
     *
     * @throws IOException
     */
    @Test
    public void getValueByScan() throws IOException {
        // 1,获取table对象,并指定要操作那个命名空间那个table表
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        //  2,查询数据
        // 查询全部的数据
        Scan scan = new Scan();
        // 获得数据集合
        final ResultScanner results = table.getScanner(scan);
        // 3,展示数据
        // 把集合转为iterator类型
        final Iterator<Result> iterator = results.iterator();
        while (iterator.hasNext()) {
            // 遍历获取每行数据
            final Result res = iterator.next();
            // 得到每行的cell=rowKey+列族+列限定符+时间戳+value
            final List<Cell> cells = res.listCells();
            for (Cell cell : cells) {
                final byte[] familyByte = CellUtil.cloneFamily(cell);
                final String family = new String(familyByte);
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 4,关闭
        table.close();
    }

    /**
     * 使用scan指定family查询数据
     * @throws IOException
     */
    @Test
    public void  getValueByScanFamily() throws IOException {
        // 1获取table 并指定那个命名空间那个表
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));

        // 查询数据
        //  创建scan
        Scan scan=new Scan();
        // 设置列族
        scan.addFamily("base_ifo".getBytes());
        // 获取数据
        final ResultScanner scanner = table.getScanner(scan);
        // 展示数据
        // 把结果集转为iterator
        final Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()){
            // 获取一条记录
            final Result result = iterator.next();
            // 获取cell集合
            final List<Cell> cells = result.listCells();
            // 遍历cells 获取cell的组成信息
            for (Cell cell : cells) {
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String family = new String(CellUtil.cloneFamily(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 4 关闭资源
        table.close();

    }

    // =====================================================过滤器=================================================================

    /**
     * 根据值来查询数据
     * 例如 select * from xx where age=25
     */
    @Test
    public void filterByValue() throws IOException {
        // 1获取table对象 并指定那个命名空间那个表
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        // 2 查询数据
        // 2.1 创建scan
        Scan scan=new Scan();
//        // 2.2 根据value进行过滤,只显示单个cell
//        BinaryComparator comparator=new BinaryComparator(Bytes.toBytes(25));
//        final ValueFilter valueFilter = new ValueFilter(CompareOperator.EQUAL, comparator);
//        scan.setFilter(valueFilter);

        // 根据vlaue进行过滤,显示整行数据 age=25的那一行数据全部查询出来
        final SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("base_ifo".getBytes(), "age".getBytes(), CompareOperator.EQUAL, Bytes.toBytes(25));
        scan.setFilter(singleColumnValueFilter);
        final ResultScanner scanner = table.getScanner(scan);
        // 展示数据
        final Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()){
            // 获取一条记录
            final Result result = iterator.next();
            // 获取cell集合
            final List<Cell> cells = result.listCells();
            // 遍历cells 获取cell的组成信息
            for (Cell cell : cells) {
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String family = new String(CellUtil.cloneFamily(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 4 关闭资源
        table.close();

    }

    /**
     * 实现模糊查询
     * select * form xx where name like "%5%"
     * @throws IOException
     */
    @Test
    public void filterByLike() throws IOException {
        // 1 获取table对象
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        Scan scan=new Scan();
        // 查询整行的数据
        final SubstringComparator comparator = new SubstringComparator("-5");
        final SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("base_ifo".getBytes(), "name".getBytes(), CompareOperator.EQUAL, comparator);
        scan.setFilter(singleColumnValueFilter);
        final ResultScanner scanner = table.getScanner(scan);
        final Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()){
            // 获取一条记录
            final Result result = iterator.next();
            // 获取cell集合
            final List<Cell> cells = result.listCells();
            // 遍历cells 获取cell的组成信息
            for (Cell cell : cells) {
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String family = new String(CellUtil.cloneFamily(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 4 关闭资源
        table.close();
    }

    /**
     * 关联条件查询
     * select * from xx where name like '%san%' and (age>20 and name='zhangsan-4')
     * @throws IOException
     */
    @Test
    public void  filterByMuti() throws IOException {
        Table table=connection.getTable(TableName.valueOf("bigHBase:person"));
        Scan scan=new Scan();
        // name like "-5"
        final SingleColumnValueFilter like = new SingleColumnValueFilter("base_ifo".getBytes(), "name".getBytes(), CompareOperator.EQUAL, new SubstringComparator("-5"));
        // age>2
        final SingleColumnValueFilter age = new SingleColumnValueFilter("base_ifo".getBytes(), "age".getBytes(), CompareOperator.GREATER, Bytes.toBytes(20));
        // name="shangsan-2"
        final SingleColumnValueFilter name = new SingleColumnValueFilter("base_ifo".getBytes(), "name".getBytes(), CompareOperator.EQUAL, Bytes.toBytes("shangsan-2"));
        //(age>20 or name='zhangsan-4')
        final FilterList nameAndAge = new FilterList(FilterList.Operator.MUST_PASS_ONE);
        nameAndAge.addFilter(name);
        nameAndAge.addFilter(age);
        //name like '%san%' and (age>20 or name='zhangsan-4')
        final FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
        filterList.addFilter(like);
        filterList.addFilter(nameAndAge);
        // 关联scan
       scan.setFilter(filterList);
       // 查询数据
        final ResultScanner scanner = table.getScanner(scan);
        final Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()){
            final Result result = iterator.next();
            // 获取cell集合
            final List<Cell> cells = result.listCells();
            // 遍历cells 获取cell的组成信息
            for (Cell cell : cells) {
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String family = new String(CellUtil.cloneFamily(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 关闭资源
        table.close();
    }

    /**
     * 模块查询,就是指定全局查询一个字段的数据
     */
    @Test
    public void rangScan() throws IOException {
        final Table table = connection.getTable(TableName.valueOf("bigHBase:person"));
        Scan scan=new Scan();
        //根据列查询
        // base_info:1  base_info:2
        final ColumnRangeFilter filter = new ColumnRangeFilter("base_ifo".getBytes(), true, null, false);

        scan.setFilter(filter);
        final ResultScanner scanner = table.getScanner(scan);
        final Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()){
            final Result result = iterator.next();
            // 获取cell集合
            final List<Cell> cells = result.listCells();
            // 遍历cells 获取cell的组成信息
            for (Cell cell : cells) {
                final String rowKey = new String(CellUtil.cloneRow(cell));
                final String family = new String(CellUtil.cloneFamily(cell));
                final String qualifier = new String(CellUtil.cloneQualifier(cell));
                final long timestamp = cell.getTimestamp();
                if ("age".equals(family) && "base_ifo".equals(family)) {
                    final int value = Bytes.toInt(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                } else {
                    String value = new String(CellUtil.cloneValue(cell));
                    System.out.println(rowKey + "--" + family + "---" + qualifier + "--" + timestamp + "----" + value);
                }
            }
        }
        // 关闭资源
        table.close();
    }
}

4、原理

1、写入流程

	1、client向zookeeper发起获取元数据位置的请求
	2、zookeeper向client返回元数据所处的位置信息
	3、client向元数据所在的regionserver发起获取元数据的请求
	4、元数据所在的regionserver给client返回元数据，client会缓存元数据
	5、client根据元数据信息知道当前数据应该插入到哪个region，region处于哪个regionserver
	6、client向数据所在regionserver发起写入请求
	7、首先将数据写入HLOG中
	8、HLOG写入成功之后，再将数据写入memstore中
	9、向client返回写入完成信息

2、flush触发条件

	flush的时候是对整个region进行flush
	1、region中某一个memstore的大小达到128M的时候会进行flush
	2、当处于写高峰的时候，第一个触发条件可以适当延迟，延迟到region中所有的memstore的大小达到4*128M的时候会阻塞client的写入，优先flush
	3、regionserver中所有的region的所有的memstore的大小达到java_heap * hbase.regionserver.global.memstore.size * hbase.regionserver.global.memstore.size.lower.limit的时候会进行flush
	4、当处于写高峰的时候,第三个触发条件可以适当延迟，延迟到regionserver中所有的memstore的大小达到java_heap * hbase.regionserver.global.memstore.size,会阻塞client的写入，优先flush
	4、当预写日志的个数达到32，会触发flush
	5、当某个region中memstore的最后一次flush的时间与当前时间相比达到一个小时，会进行flush
	6、手动flush: flush '表名'

3、读流程

	1、client向zookeeper发起获取元数据位置的请求
	2、zookeeper向client返回元数据所处的位置信息
	3、client向元数据所在的regionserver发起获取元数据的请求
	4、元数据所在的regionserver给client返回元数据，client会缓存元数据
	5、client根据元数据信息知道当前数据应该插入到哪个region，region处于哪个regionserver
	6、client向数据所在regionserver发起读取请求
	7、首先从block cache中读取数据
	8、然后从memstore中读取数据
	9、从storeFile中读取数据
		store中storeFile个数可能比较多,如何快速找到数据处于哪个storeFile？
			1、通过布隆过滤器，判断数据可能存在哪些storeFile
				布隆过滤器特点: 如果判断存在，则不一定存在，如果判断不存在，则一定不存在
			2、通过HFile这种文件格式中的数据索引找到数据所在的位置
				HFile文件： Hfile中包含数据信息，数据的索引信息，数据元数据信息
	10、将数据返回给client

5、compact触发条件

	原因: 当memstore flush会生成一个storeFile文件，随着时间的流逝，storeFIle文件会越来越多，会影响查询性能，所以hbase提供文件的合并机制
	1、minor compact:
		触发条件: 当小文件[小于128M]数据达到3个的时候会触发
		合并过程: 只是单纯的合并文件的内存，不会清除过期数据或者无效数据
		合并结果: 将小文件合并成大文件
	2、major compact:
		触发条件: 7天一次
		合并过程: 在合并的过程中，会清除清除过期数据或者无效数据
		合并结果: 将store中是所有文件合并成一个文件

6、split触发条件

	原因: 当数据写入region，时间过长之后，region越来越大，后续请求大部分可能都会落在该region上，导致数据的读写请求负载不均衡
	0.94版本之前: 当region中某个store的大小达到10G，该region就会进行split
	0.9-2.0版本: 会根据region所属表在当前regionserver上有几个region[N]
		N==0 || N>100 ? 10G : Min(10G,2* 128M *N^3)
	2.0版本之后: 会根据region所属表在当前regionserver上有几个region[N]
		N==1 ? 2*128M : 10G

鸿儒之观

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HBase详细解读之一

1、HBase简介1、Hbase是什么? Hbase是分布式存储海量数据的Nosql数据库2、Hbase的应用场景: 实时 3、数据模型 1、Table: 数据存储形式 2、Region: table的一个分段，region保存在regionserver上面 3、Store: store的个数与列簇的个数一致 4、rowkey： hbase的数据的主键。数据在hbase中是有序的，按照rowkey的字典序进行排序 5、列簇: hbase的表结构的一部分[相当于mysql的字段] 6
复制链接

扫一扫

专栏目录