Hadoop06---Hbase基础

最新推荐文章于 2022-05-17 11:04:53 发布

阿宾571

最新推荐文章于 2022-05-17 11:04:53 发布

阅读量607

点赞数

分类专栏：笔记文章标签： hadoop hbase

本文链接：https://blog.csdn.net/lb634774742/article/details/110309281

版权

笔记专栏收录该内容

18 篇文章 1 订阅

订阅专栏

HBASE基础

region核心知识：

表的行范围数据，将一张大的表格划分成多个region，将region分配给不同的regionserver及其管理
分布式数据库region中有：
- store — 一个列族对应一个store
- memorystore — 写数据的内存对象，对整个hfile中的数据排序
- WALG 记录用户的操作行为
- storefile 内存对象flush到hdfs中形成hfile文件，storefile就是hfile的抽象对象
- blockCache 提升查询效率
namespace中：
- 表 ns：tb_name
- 列族 — 列的分类管理 1）不要太多 2）命名不要太长
- 行键 — 1）行的唯一标识 2）索引 3）一维排序 4）布隆过滤器
- 属性 — 稀疏性
- 值 — 字节数据

一软件安装及启动

1.软件安装

注意 : HBASE版本需要与JDK、Hadoop、Zookeeper版本相兼容，兼容版本号见官网*

上传解压
设置时间同步

hbase-env.sh 修改内容：注意放开注释

export JAVA_HOME=/usr/apps/jdk1.8.0_141/

export HBASE_MANAGES_ZK=false

hbase-site.xml 添加内容：

<configuration>
<!-- 指定hbase在HDFS上存储的路径 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://linux01:8020/hbase</value>
</property>
<!-- 指定hbase是分布式的 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
  <name>hbase.unsafe.stream.capability.enforce</name>
  <value>false</value>
</property>
<!-- 指定zk的地址，多个用“,”分割 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>linux01:2181,linux02:2181,linux03:2181</value>
</property>
</configuration>

regionservers 配置启动集群中的Regionserver 机器
```
linux01
linux02
linux03
```
分发

启动

master 与regionserver 分别启动

bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver

一键启动

bin/start-hbase.sh   一键启动

2.软件启动

启动hbase服务:
- start-hbase.sh
启动shell客户端
- hbase shell

二 shell客户端

1.基础命令general

processlist 展示进程列表
status 展示当前hbase状态
table_help 基础帮助
version 版本信息
whoami 用户信息

hbase(main):005:0> whoami
root (auth:SIMPLE)
    groups: root
Took 0.0160 seconds

2.数据定义语言DDL

list 列举出默认名称空间下所有的表
create 建表
alter修改列族schema(表结构和属性) ,添加列,修改列
alter_async 这个不需要等所有region收到schema发生更改就返回
alter_status 可以查看alter进度，有几个region收到schema更改通知
describe/desc 查看表结构信息
disable 禁用表
disable_all禁用多张表
is_disabled查看是否是禁用
drop删除表
drop_all删除多张表
enable启动一张表
enable_all启动多张表
is_enabled查看表是否启用
exists查看表是否存在
get_table获取表对象
list_regions列出表所有region信息
locate_region查看表的某行数据所在的region信息
show_filters列举出系统可用的过滤器(用用条件筛选查询操作)

3.名称空间

alter_namespace 修改命名空间
create_namespace 创建空间
describe_namespace 查看空间信息
drop_namespace 删除空间–前提删除所有表
list_namespace 展示所有空间
list_namespace_tables 展示某空间内所有表格

4.数据操作语言DML

append 追加数据

count 统计行数

hbase(main):009:0> count 'tb_imp'
7 row(s)
Took 0.0847 seconds 
=> 7

hbase(main):010:0> count 'tb_imp' , INTERVAL => 2
Current count: 2, row: uid002
Current count: 4, row: uid004
Current count: 6, row: uid006     
7 row(s)
Took 0.0185 seconds            
=> 7

delete 删除
deleteall 删除行
get 获取
get_counter 获取自增数据/数字
get_splits 获取region的分割点
incr 添加自增数
put 添加数据
scan 浏览表格

三 Java客户端

1.工具类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;

public class HbaseUtiles {



    /**
     * 展示一个result中的数据
     * @param next
     */
    public static void showResult(Result next) {
        while (next.advance()) {
            Cell cell = next.current();
            byte[] row = CellUtil.cloneRow(cell);
            byte[] family = CellUtil.cloneFamily(cell);
            byte[] qualifier = CellUtil.cloneQualifier(cell);
            byte[] value = CellUtil.cloneValue(cell);
            System.out.println(new String(row)+"-->"+new String(family)+":"+new String(qualifier)+"--->"+new String(value));
        }
    }



    public static TableDescriptor getTableDescriptor(String tableName, String...columnFamilyName) {

        ArrayList<ColumnFamilyDescriptor> list = new ArrayList<>();
        //表描述构造器
        TableDescriptorBuilder tableDescriptorBuilder = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName));

        for (String s : columnFamilyName) {
            //获取列族描述构造器,传入列族名
            ColumnFamilyDescriptorBuilder columnFamilyDescriptorBuilder = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(s));
            // 设置列族的属性
//          columnFamilyDescriptorBuilder.setTimeToLive(300);
//          columnFamilyDescriptorBuilder.setMaxVersions(3);
            //族进行构造,获得族描述对象
            ColumnFamilyDescriptor cf = columnFamilyDescriptorBuilder.build();
            //将描述对象添加到list集合
            list.add(cf);
        }

        //为表描述构造器设置列族
        tableDescriptorBuilder.setColumnFamilies(list);
        //表描述构造器->表描述器
        return tableDescriptorBuilder.build();
    }


    public static Table getTable(Connection conn, String tableName) throws IOException {
        /**
         *  连接Hbase表对象(获取Table连接)
         *  使用表对象进行操作
         */
        Table table = conn.getTable(TableName.valueOf(tableName));
        return table;
    }


    public static Connection getHbaseConn() throws IOException {
        /**
         * 获取Hbase客户端   连接ZK
         *
         * 1.获取Hbase配置对象
         * 2.设置参数  设置ZK的位置
         * 3.获取连接
         */
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "linux01:2181,linux02:2181,linux03:2181");
        Connection conn = ConnectionFactory.createConnection(conf);
        return conn;
    }
}

2.常用方法

package cn.doit.ab;

import Utils.HbaseUtiles;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.*;

public class HbaseStudy {
    public static void main(String[] args) throws IOException {
        //获取连接对象
        Connection conn = HbaseUtiles.getHbaseConn();

        conn.close();
    }

    /**
     * put.getFamilyCellMap方法
     * 获取一行的列族(做键)和其内的所有cell单元格集合(做值)--->组成的集合
     * 不知道老师在干啥
     */
    public static void buzhidaolaoshizaigansha() {
        // Put  行 列族 属性  值    (列族 属性  值)单元格
        Put put = new Put("rk10001".getBytes());

        // 一行中的多个列族
        List<String>  cfs = new ArrayList<>() ;
        // cf1   Cell1
        // cf1   Cell2
        // cf1   Cell3
        // cf1   Cell4
        NavigableMap<byte[], List<Cell>> familyCellMap = put.getFamilyCellMap();
        Set<Map.Entry<byte[], List<Cell>>> entries = familyCellMap.entrySet();
        // 编列每个列族
        for (Map.Entry<byte[], List<Cell>> entry : entries) {
            byte[] key = entry.getKey();
            String cf = Bytes.toString(key);
            // 将所有的列族存储在list集合中
            cfs.add(cf) ;
            // 每个列族中所有的单元格
            List<Cell> cells = entry.getValue();
            for (Cell cell : cells) {
                byte[] bytes = CellUtil.cloneQualifier(cell);
                String q = new String(bytes);
                if (q.equals("name")) {}
                byte[] value = CellUtil.cloneValue(cell);
            }
            // 获取单元格的属性
        }
    }



    /**
     * 不知道在干啥
     * @param tb_imp
     * @throws IOException
     */
    public static void buzhidaozaigansha(Table tb_imp) throws IOException {
        Put put = new Put(Bytes.toBytes("uid002"));
        NavigableMap<byte[], List<Cell>> familyCellMap = put.getFamilyCellMap();
        Set<byte[]> bytes = familyCellMap.keySet();
        for (byte[] aByte : bytes) {
            List<Cell> cells = familyCellMap.get(aByte);
            System.out.println(new String(aByte));
            for (Cell cell : cells) {
                byte[] qualifier = CellUtil.cloneQualifier(cell);
                byte[] value = CellUtil.cloneValue(cell);
                System.out.println(new String(qualifier)+new String(value));
            }
        }
        tb_imp.put(put);
    }

  
  
    /**
     * 获取一行数据并打印
     * @param tb_imp
     * @throws IOException
     */
    public static void getOneRow(Table tb_imp) throws IOException {
        Get get = new Get(Bytes.toBytes("uid001"));

        Result result = tb_imp.get(get);
        HbaseUtiles.showResult(result);
    }

    public static void scanShuXing(Table tb_imp) throws IOException {
        ResultScanner scanner = tb_imp.getScanner(Bytes.toBytes("info"), Bytes.toBytes("name"));
        Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()) {
            Result next = iterator.next();
            HbaseUtiles.showResult(next);
        }
    }


    /**
     * 浏览表格
     * @param tb_imp
     * @throws IOException
     */
    public static void scanTable(Table tb_imp) throws IOException {
        //获取浏览对象---浏览整个表格
        ResultScanner scanner = tb_imp.getScanner(new Scan());
        Iterator<Result> iterator = scanner.iterator();
        while (iterator.hasNext()) {
            Result next = iterator.next();
            HbaseUtiles.showResult(next);
        }
    }


    /**
     * 使用put添加多行内容
     * @param conn
     * @throws IOException
     */
    public static void addSomeRows(Connection conn) throws IOException {
        //获取表格
        Table deyunshe = HbaseUtiles.getTable(conn, "deyunshe");

        ArrayList<Put> puts = new ArrayList<>();

        //创建put对象,写入行号--------------1
        Put put = new Put(Bytes.toBytes("rk1001"));

        //向put对象添加内容
        put.addColumn(Bytes.toBytes("d1"),Bytes.toBytes("dougen"),Bytes.toBytes("孟鹤堂"));
        put.addColumn(Bytes.toBytes("d1"),Bytes.toBytes("penggen"),Bytes.toBytes("周九良"));
        puts.add(put);

        //创建put对象,写入行号----------------------2
        Put put2 = new Put(Bytes.toBytes("rk1002"));

        //向put对象添加内容
        put2.addColumn(Bytes.toBytes("d1"),Bytes.toBytes("dougen"),Bytes.toBytes("张鹤伦"));
        put2.addColumn(Bytes.toBytes("d1"),Bytes.toBytes("penggen"),Bytes.toBytes("郎鹤焱"));
        puts.add(put2);

        //创建put对象,写入行号---------------------------------3
        Put put3 = new Put(Bytes.toBytes("rk1003"));

        //向put对象添加内容
        put3.addColumn(Bytes.toBytes("d1"),Bytes.toBytes("dougen"),Bytes.toBytes("shaobing"));
        put3.addColumn(Bytes.toBytes("d1"),Bytes.toBytes("penggen"),Bytes.toBytes("caoheyang"));
        puts.add(put3);

        deyunshe.put(puts);

        //关闭连接
        deyunshe.close();
    }


    /**
     * 获取admin对象
     * DDL语言  Tools 有关的操作在  admin对象中
     * @param conn
     * @return
     * @throws IOException
     */
    public static Admin createRegionsTable(Connection conn) throws IOException {

        Admin admin = conn.getAdmin();

        TableDescriptor tableDescriptor = HbaseUtiles.getTableDescriptor("ttb", "lie1", "lie2", "lie3", "lie4");

        byte[][] bytes = new byte[][]{"d".getBytes(),"g".getBytes(),"j".getBytes()};

        admin.createTable(tableDescriptor,bytes);
        return admin;
    }
}

命名空间方法:

/**
 * FileName: NameSpaceDemo
 * Author:   多易教育-DOIT
 * Date:     2020/11/25 0025
 * Description:
 */
public class NameSpaceDemo {
    public static void main(String[] args) throws Exception {
        Admin admin = HbaseUtils.getAdmin();
        // 创建名称空间    数据库(namespace)
        NamespaceDescriptor.Builder ns1 = NamespaceDescriptor.create("ns1");
        // 设置描述属性
        ns1.addConfiguration("doit19" ,"6666");
       // ns1.addConfiguration("好好学习" ,"天天向上");
        NamespaceDescriptor namespaceDescriptor = ns1.build();
       // admin.createNamespace(namespaceDescriptor);
        admin.modifyNamespace(namespaceDescriptor);
        admin.close();
    }
}

3.其他方法

代码如下(老师代码):

public class ClientDemo {
    public static void main(String[] args) throws Exception {
        createTableWithOneColumFamily() ;
    }

    private static void getlistRegions() throws IOException {
        Admin admin = HbaseUtils.getAdmin();
        // 所有服务器
     /*   Collection<ServerName> servers = admin.getRegionServers();
        for (ServerName server : servers) {
            List<RegionMetrics> regionMetrics = admin.getRegionMetrics(server);
            for (RegionMetrics regionMetric : regionMetrics) {
                byte[] regionName = regionMetric.getRegionName();
            }
        }
        */
        List<RegionInfo> tb_aa = admin.getRegions(TableName.valueOf("tb_a"));
        for (RegionInfo regionInfo : tb_aa) {
            String encodedName = regionInfo.getEncodedName();
            byte[] startKey = regionInfo.getStartKey();
            byte[] endKey = regionInfo.getEndKey() ;
            System.out.println(encodedName+"--"+new String(startKey)+"--"+new String(endKey));
        }
        admin.close();
    }

    private static void incr() throws Exception {
        Table tb_imp = HbaseUtils.getTable("tb_imp");
        Increment increment = new Increment("uid002".getBytes());
        // 参数三  增加的数据
        increment.addColumn("info".getBytes(), "cnt".getBytes(), 1);
        tb_imp.increment(increment);
        // 参数四  获取自增数据的值以后再加上 amount
        long l = tb_imp.incrementColumnValue("uid002".getBytes(), "info".getBytes(), "cnt".getBytes(), 1);
        System.out.println(l);
        tb_imp.close();
    }

    /**
     * 删除多行  List<Delete></>
     * 删除一行
     * 删除一行的一个列族
     * 删除行属性
     *
     * @throws Exception
     */
    private static void testDelete() throws Exception {
        Table tb_imp = HbaseUtils.getTable("tb_imp");
        Delete delete = new Delete("uid002".getBytes());
        delete.addColumn("info".getBytes(), "gender".getBytes());
        // 删除一行
        tb_imp.delete(delete);
        tb_imp.close();
    }

    /**
     * 一行
     * 一行的一个列族数据
     * 一行的一个属性
     *
     * @throws Exception
     */
    private static void getData() throws Exception {
        Table tb_user = HbaseUtils.getTable("tb_user");
        Get get = new Get("rk1001".getBytes());
        // 行
        // get.addFamily("cf1".getBytes()) ;
        get.addColumn("cf1".getBytes(), "name".getBytes());
        Result result = tb_user.get(get);
        HbaseUtils.showData(result);
        tb_user.close();
    }

    /**
     * 扫描一个列族的数据
     *
     * @throws Exception
     */
    private static void scanDataByCf() throws Exception {
        // 查询数据
        Table tb_user = HbaseUtils.getTable("tb_user");
        // Scan scan = new Scan();
        // 查询整个列族的所有数据
        ResultScanner resultScanner = tb_user.getScanner("cf1".getBytes());
        // tb_user.getScanner(a1 , a2)参数一 列族  参数二属性
        Iterator<Result> iterator = resultScanner.iterator();
        while (iterator.hasNext()) {
            // 行
            Result result = iterator.next();
            // 遍历多个单元格
            HbaseUtils.showData(result);
        }

        tb_user.close();
    }

    private static void scanWithRow(Table tb_user) throws IOException {
        // 多行  多列族 多属性
        Scan scan = new Scan();
        // 显示前n行
        //scan.setLimit(3) ;
        scan.withStartRow("uid003".getBytes());
        scan.withStopRow("uid006".getBytes());  // [)
        // 扫描全表数据
        ResultScanner results = tb_user.getScanner(scan);
        // 思路  一行     Result  行
        Iterator<Result> iterator = results.iterator();
        while (iterator.hasNext()) {
            // 行
            Result result = iterator.next();
            // 遍历多个单元格
            HbaseUtils.showData(result);
        }
    }

    private static void scanAllTable(Table tb_user) throws IOException {
        // 多行  多列族 多属性
        Scan scan = new Scan();
        // 扫描全表数据
        ResultScanner results = tb_user.getScanner(scan);
        // 思路  一行     Result  行
        Iterator<Result> iterator = results.iterator();
        while (iterator.hasNext()) {
            // 行
            Result result = iterator.next();
            // 遍历多个单元格
            HbaseUtils.showData(result);
        }
    }

    /**
     * 批次插入
     *
     * @throws IOException
     */

    private static void mutationInsert() throws IOException {
        Connection conn = HbaseUtils.getHbaseConnection();
        BufferedMutator mutator = conn.getBufferedMutator(TableName.valueOf("tb_user"));
        Put put = new Put(Bytes.toBytes("rk1005")); // 行键
        put.addColumn(Bytes.toBytes("cf1"),// 列族
                Bytes.toBytes("name"),//属性
                Bytes.toBytes("OOO") //值
        );
        // 插入数据
        mutator.mutate(put);
        // 将缓存在本地的数据   请求Hbase插入
        mutator.flush();
    }

    /**
     * 1 Put 方式不好   一行插入一次  进行一次RPC请求
     * ------------------>hbase
     * ------------------>hbase
     * ------------------>hbase
     * ------------------>hbase
     * 2 批次   插入
     * 3 数据插入到hbase中的本质(将数据以Hfile文件格式存储在HDFS的指定位置)
     * 将大量的静态数据转换成hfile文件  直接存储到指定的hdfs路径下  批量导入
     * 将普通的数据 (大量)  ---->   hfile格式的数据 [MR]
     * [MR]: 自定义输入和输出[hfile]  --> 指定路径  *****
     * 1)   编写java程序  mapper读数据  处理  reducer 写数据
     * 2)   使用shell脚本  *** 操作方便简单
     *
     * @throws Exception
     */
    private static void addSomeRows() throws Exception {
        // 插入数据
        Table tbUser = HbaseUtils.getTable("tb_user");
        // put 'tb_user' , 'rk' ,'cf:q' , 'v'
        Put put1 = new Put(Bytes.toBytes("rk1002")); // 行键
        Put put2 = new Put(Bytes.toBytes("rk1003")); // 行键
        Put put3 = new Put(Bytes.toBytes("rk1004")); // 行键
        // 分别添加单元格
        /**
         * 参数一  列族
         * 参数二 属性
         * 参数三  值
         */
        // put1.addColumn(a1 , a2  , a3)

        List<Put> ls = new ArrayList<>();
        ls.add(put1);
        ls.add(put2);
        ls.add(put3);
        tbUser.put(ls);
        tbUser.close();
    }

    private static void putOneRowSomeCell() throws Exception {
        // 插入数据
        Table tbUser = HbaseUtils.getTable("tb_user");
        // put 'tb_user' , 'rk' ,'cf:q' , 'v'
        Put put = new Put(Bytes.toBytes("rk1001")); // 行键

        put.addColumn(Bytes.toBytes("cf1"),// 列族
                Bytes.toBytes("age"),//属性
                Bytes.toBytes(23) //值
        );
        put.addColumn(Bytes.toBytes("cf2"),// 列族
                Bytes.toBytes("job"),//属性
                Bytes.toBytes("coder") //值
        );
        put.addColumn(Bytes.toBytes("cf3"),// 列族
                Bytes.toBytes("hobby"),//属性
                Bytes.toBytes("M") //值
        );
        put.addColumn(Bytes.toBytes("cf3"),// 列族
                Bytes.toBytes("friend"),//属性
                Bytes.toBytes("xiaokang") //值
        );
        tbUser.put(put);
        tbUser.close();
    }

    private static void putOneRowOneCell() throws Exception {
        // 插入数据
        Table tbUser = HbaseUtils.getTable("tb_user");
        // put 'tb_user' , 'rk' ,'cf:q' , 'v'
        Put put = new Put(Bytes.toBytes("rk1001")); // 行键
        put.addColumn(Bytes.toBytes("cf1"),// 列族
                Bytes.toBytes("name"),//属性
                Bytes.toBytes("XXX") //值
        );
        tbUser.put(put);
        tbUser.close();
    }

    /**
     * 创建预分region表
     *
     * @throws IOException
     */
    private static void createPreRegionTable() throws IOException {
        Admin admin = HbaseUtils.getAdmin();
        // 表的描述构建器
        TableDescriptorBuilder builder = TableDescriptorBuilder.newBuilder(TableName.valueOf("tb_pre_region"));
        // 列族的构建器
        ColumnFamilyDescriptorBuilder columnFamilyDescriptorBuilder =
                ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("cf1"));
        // 构建列族描述器
        ColumnFamilyDescriptor columnFamilyDescriptor = columnFamilyDescriptorBuilder.build();
        // 列族   列族描述器
        builder.setColumnFamily(columnFamilyDescriptor);
        TableDescriptor descriptor = builder.build();
        /**
         * 预分region表: 对表的数据进行合理的规划 , 将数据存储不同的region中  避免插入热点
         * 参数一  表的描述器
         * 参数二  预分region的splitKey
         */
        byte[][] keys = new byte[][]{"g".getBytes(), "o".getBytes()};
        admin.createTable(descriptor, keys);
        admin.close();
    }

    /**
     * 多个列族
     *
     * @throws IOException
     */
    private static void createTableWithMoreCf() throws IOException {
        //  createTableWithOneColumFamily();
        Admin admin = HbaseUtils.getAdmin();
        TableDescriptorBuilder tableDescriptorBuilder = TableDescriptorBuilder.newBuilder(TableName.valueOf("tb_teacher2"));
        // 列族的构建器
        ColumnFamilyDescriptorBuilder cf1Builder = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("cf1"));
        // 设置列族的属性
        cf1Builder.setTimeToLive(300);
        cf1Builder.setMaxVersions(3);
        ColumnFamilyDescriptor cf1 = cf1Builder.build();
        ColumnFamilyDescriptorBuilder cf2Builder = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("cf2"));
        ColumnFamilyDescriptor cf2 = cf2Builder.build();
        ColumnFamilyDescriptorBuilder cf3Builder = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("cf3"));
        ColumnFamilyDescriptor cf3 = cf3Builder.build();
        List<ColumnFamilyDescriptor> ls = new ArrayList<>();
        ls.add(cf1);
        ls.add(cf2);
        ls.add(cf3);
        // 添加列族
        tableDescriptorBuilder.setColumnFamilies(ls);
        TableDescriptor descriptor = tableDescriptorBuilder.build();
        admin.createTable(descriptor);
        admin.close();
    }

    /**
     * 一个列族
     *
     * @throws IOException
     */
    public static void createTableWithOneColumFamily() throws IOException {
        Admin admin = HbaseUtils.getAdmin();
        // 建表   表名  列族
        TableName tb_stu1 = TableName.valueOf("ns1:tb_stu");
        // 表的描述构建器
        TableDescriptorBuilder builder = TableDescriptorBuilder.newBuilder(tb_stu1);
        // 列族的构建器
        ColumnFamilyDescriptorBuilder columnFamilyDescriptorBuilder =
                ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("cf1"));
        // 构建列族描述器
        ColumnFamilyDescriptor columnFamilyDescriptor = columnFamilyDescriptorBuilder.build();
        // 列族   列族描述器
        builder.setColumnFamily(columnFamilyDescriptor);
        // 表的描述器
        TableDescriptor descriptor = builder.build();
        admin.createTable(descriptor);
        admin.close();
    }
}

四 Hbase原理

1.Hbase数据结构

Hbase数据存储在hdfs中，/hbase/data 目录下

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NRCVAryx-1606624496157)(img/hbase-目录.png)]

同一行的数据，并不一定在同一目录，hbase底层是按照列族作为最小分支存储数据
hbase不自动刷写数据时，若手动刷写，每次都会生成一个hfile文件

2.写入数据

写入数据时，输入的信息有：行、列族、属性 组成的键信息，以及输入的值信息 。
客户端向Zookeeper发出写入请求，Zookeeper返回元数据表的位置（某个regionserver）

备注：元数据存储在hdfs中，在hdfs的路径信息记录在某个regionserver和Zookeeper中（共同存储）
客户端通过regionserver下载元数据—META缓存，并对其解析
通过输入信息的表、行信息，确定表及其对应的region所在的HRegionServer
向对应的HRegionServer发出写入请求，同时，其行为被日志系统记录，防止传输过程中HRegionServer宕机造成操作失败。
客户端通过对应的HRegionServer开始写入数据，通过输入的列族信息，找到该HRegionServer设备中—>对应的表的region—>对应的列族对象（store）
写入的数据会先暂时储存在MemStore对象中
达到以下条件，会进行flush：
- 手动flush
- 写满128M，自动flush
- 所在HRegionServer内存达到设定的阈值，所有的MemStore都会flush（会形成阻塞）
- 操作次数达到一定次数，自动flush
flush后，MemStore对象会序列化到hdfs中，生成Hfile文件，并在region中生成对应的映像文件—storeFile

Hbase原理加强-写数据流程

3.读取数据

向Zookeeper发出写入请求，Zookeeper返回元数据表的位置
客户端通过regionserver下载元数据—META缓存，并对其解析
通过输入信息的表、行信息，确定表及其对应的region所在的HRegionServer
向对应的HRegionServer发出读取请求
blockCache ：regionserver中的缓存块对象，存储了最近读取过的Hfile文件数据
客户端向regionserver发出的请求会最先到blockCache中查找有无对应信息

—>若没有，到对应的region的列族对象store的缓存对象MemStore中查找

—>若没有，到HDFS系统中对应的列族目录下查找
列族目录下，可能包含多个Hfile文件，遍历搜索效率低，引入布隆过滤器进行搜索
将包含指定信息（包含要读取的行的Hfile文件）的文件读取、返回、并缓存到blockCache中

读数据流程

4.布隆过滤器

Hbase文件存储特点：
- 同一个region的文件按照列族存储，而不是按行存储；
- 也就导致了在一个Hfile文件中，存储的是一个列族的多行数据。
Hbase系统读取数据特点：
- 通常是读取一行数据，或者是读取单个cell数据；
- 当region中存储大量数据后，列族目录下就会有大量的Hfile文件；
- 而不论是读取一行数据还是单个cell数据，首先都要通过行键在对应的region目录下查找包含有该行键信息的Hfile文件。
需求分析：
- 通过对上述Hbase文件存储特点和读取数据特点的分析，发现一个关键数据—行键；
- 只要能快速确认Hfile中是否包含要找的行键，就能极大提高搜索效率；
- 那么可不可以在存入数据的时候，在Hfile中创建一个集合，将每个存入的数据的行键都放入集合中，在搜索数据时，根基要找的行键遍历集合，即可知道该Hfile中是否包含需要的数据；
- 但是这样效率依然不够高，并且占用的内存也较大，故引入—>布隆过滤器；
布隆过滤器原理：
- 在Hfile中开辟出一个连续的1M大小的空间，以字节为单位作为分隔
- 所有字节默认值为0，形成一个长度为8,388,608的标记队列（临时叫法）
- 在Hbase写入数据时，通过存入的数据的行键，计算出其对应的hashcode值，在队列中找到该值对应的字节位置，将字节的值改为1
- 在Hbase读取数据时，将要查找的行键通过相同的hashcode算法求得hashcode值，查看每个Hfile文件的标记队列的相应位置的值
  - 如果值为0，说明该Hfile文件中不包含要查找的行键的信息
  - 如果值为1，则有极大可能，该Hfile文件中包含该行键的信息（哈希冲突）
布隆过滤器缺点：
- 哈希冲突：
  - 原因：如果两个不同的行键，恰好hashcode值相同，在读取查找时，可能会造成误判
  - 分析：可不做处理，在后续的读取过程中，自然会根据行键进行遍历查找，对取出的结果影响不大；如果要处理，个人人为，可以设置两个标记队列，使用不同的hashcode算法，进行标记，两个不同算法同时重复的概率非常低
- hashcode超出：
  - 原因：通过行键计算出的hashcode值超出了1M队列的最大值：8388608
  - 分析：在录入数据时，限制行键长度；优化hashcode算法；在队列中查找前，先做判断是否大于8388608，如果大于，把行键存到一个独立的集合中
布隆过滤器核心思想：
- 利用机器处理2进制数据的优势，机器可以通过2>>hashcode，快速找到队列的对应位置，提高效率
- 充分利用字节的特性，0为无，1为有；又利用hashcode的独特性，用最少的空间，存储了关键信息的有无信息。

5.合并

数据合并，会占用系统大量资源，在访问高峰时，不可以进行合并操作

5.1 region的合并

表中有大量的数据被删除以后，表的行数急剧减少，region的个数没有变，每个region管理的数据的行数变少，这个时候，hbase会自动合并region。

也可以手动进行合并，不建议：

5.2 删除数据

首先，hdfs不支持随机写入、修改数据
hbase的删除，并不是直接删掉了数据，而是添加了另外一种数据

墓碑标记：

正常数据存储时，标记为put

K: uid001/info:age/1606274379904/Put/vlen=2/seqid=4 V: 23
K: uid001/info:gender/1606274379904/Put/vlen=1/seqid=4 V: F
K: uid001/info:name/1606274379904/Put/vlen=3/seqid=4 V: zss
K: uid002/info:age/1606274379904/Put/vlen=2/seqid=4 V: 13
K: uid002/info:gender/1606274379904/Put/vlen=1/seqid=4 V: M
K: uid002/info:name/1606274379904/Put/vlen=3/seqid=4 V: lss
K: uid003/info:age/1606274379904/Put/vlen=2/seqid=5 V: 22
K: uid003/info:gender/1606274379904/Put/vlen=1/seqid=5 V: M

删除数据时，会在相同的目录下，创建出一个hfile文件，存储一个参数相同的标记为delete的数据

一定时间内，hbase会自动对原文件和墓碑标记的文件进行合并，实现删除

5.3 hfile的合并

在以下情况，hbase会进行hfile的自动合并：

大量的更新或删除数据
有大量的小文件（列族多，内存小造成自动flush）
有TTL过期数据

6.拆分

拆分策略
- 预分region
- 自动默认大小拆分
  - 一个region—256M
  - 两个region—2G
  - 三个region—6.75G
  - 三个以上—10G
- ProfixkeySplit 自动以拆分策略
  - 这种拆分是在原来的拆分基础上 ,增加了拆分点(splitPoint,拆分点就是Region被拆分时候的rowkey)的定义
  - 保证有相同前缀的rowkey不会被拆分到不同的Region上
  - 参数是keyPrefixRegionSplitPolicy.prefix_length rowkey:前缀长度
- DelimitedKeyPrefixRegionSplitPolicy 分隔符拆分
  - 和上一种查分策略一致 , 上一种是按照key的固定长度拆分的 , 这种按照的是分割符
  - DelimitedKeyPrefixRegionSplitPolicy.delimiter参分割符
- 手动拆分，不推荐使用
手动强制拆分
- 没有预期到的查询热点数据需要手动的拆分
- 语法：split 'tableName', 'splitKey'

7.二级索引

使用hbase进行查询时，使用行键进行查询效率较高，如果使用其他属性查询效率就会大幅下降；然而在平时使用过程中，通常会有固定两个或者三个维度查询需求
使用二级索引，就是在hbase中再创建一个表格，以要查询的属性作为键，原本的行键作为值，通过这个表格再找到原表，就能达到快速查询的目的
二级索引实现：
- 写（继承、实现）协处理器（拦截器）
- 重写prePut方法
  - 遍历put中的所有单元格
  - 获取单元格的属性，判断属性名，找到要额外查询的属性a
  - 根据a作为行键建立index表
  - 获取原表中的行键，作为值
  - 创建Put2<a,行键> 作为二级索引表

五常用操作

1.数据导入

1.1 shell方式

（使用bulkLoad导入数据到Hbase）：

建表
create 'tb_imp' ,'info'
根据数据,生成hbase文件(hfile文件)

hbase  org.apache.hadoop.hbase.mapreduce.ImportTsv  -Dimporttsv.separator=, -Dimporttsv.colu  mns='HBASE_ROW_KEY,info:name,info:age,info:gender' -Dimporttsv.bulk.output=/tsv/output tb_imp  /tsv/input

将生成的hfile文件导入到hbase表中

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tsv/output tb_imp

1.2 put和mutation

在shell端和java端都可以直接用put进行添加数据
但是put操作每次都要单独的和hbase进行交互，占用资源较多，引入mutation

    /**
     * 批次插入
     *
     * @throws IOException
     */

    private static void mutationInsert() throws IOException {
        Connection conn = HbaseUtils.getHbaseConnection();
        BufferedMutator mutator = conn.getBufferedMutator(TableName.valueOf("tb_user"));
        Put put = new Put(Bytes.toBytes("rk1005")); // 行键
        put.addColumn(Bytes.toBytes("cf1"),// 列族
                Bytes.toBytes("name"),//属性
                Bytes.toBytes("OOO") //值
        );
        // 插入数据
        mutator.mutate(put);
        // 将缓存在本地的数据   请求Hbase插入
        mutator.flush();
    }

1.3通过MR处理数据

import Utils.Movie;
import com.google.gson.Gson;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import java.io.IOException;

public class MR_HbaseDemo {

    static class MR_HbaseDemoMapper extends Mapper<LongWritable, Text,Text, Movie>{
        Gson gs = new Gson();
        Text k = new Text();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            try {
                String s = value.toString();
                //使用Gson读取文件
                Movie movie = gs.fromJson(s, Movie.class);

                //使用Commons.lang3的工具,把字符串填补成5位的长度,在左侧填补0
                String movieName = StringUtils.leftPad(movie.getMovie(),5,"0");

                //设计rowkey,电影名+时间戳
                String rowKey = movieName +"_"+ movie.getTimeStamp();

                k.set(rowKey);
                context.write(k,movie);

            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }


    /**
     * 与常规reduce不同,继承TableReducer
     * 输出参数为ImmutableBytesWritable,在write时,写put
     */
    static class MR_HbaseDemoReducer extends TableReducer<Text,Movie, ImmutableBytesWritable>{

        @Override
        protected void reduce(Text key, Iterable<Movie> values, Context context) throws IOException, InterruptedException {

            String rowKey = key.toString();
            //创建put对象,传入行键
            Put put = new Put(rowKey.getBytes());
            //取出map传过来的值,获取movie对象
            Movie movie = values.iterator().next();
            //向put中,添加单元格内容--->写入列族 , 属性 , 值
            put.addColumn("cf".getBytes() , "movie".getBytes() , movie.getMovie().getBytes());
            put.addColumn("cf".getBytes() , "rate".getBytes()  , movie.getRate().getBytes());
            put.addColumn("cf".getBytes() , "timeStamp".getBytes() , movie.getTimeStamp().getBytes());
            put.addColumn("cf".getBytes() , "uid".getBytes()   , movie.getUid().getBytes());

            context.write(null,put);
        }
    }


    public static void main(String[] args) throws Exception {
        //通过HBaseConfiguration创建连接
        Configuration conf = HBaseConfiguration.create();
        //设置连接参数
        conf.set("hbase.zookeeper.quorum","linux01:2181,linux02:2181,linux03:2181");

        //获取Job,进行设置
        Job job = Job.getInstance(conf);
        //map与常规相同,reduce不同
        job.setMapperClass(MR_HbaseDemoMapper.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Movie.class);

        //设置输入路径
        FileInputFormat.setInputPaths(job,new Path("D:\\duoyi\\08_Hadoop\\MR案例\\movie\\input"));

        //设置要输出到Hbase中的表名,表要已经创建好的,,,并导入reduce类
        TableMapReduceUtil.initTableReducerJob("movie",MR_HbaseDemoReducer.class,job);

        job.waitForCompletion(true);
    }
}

2.过滤器

常用的比较符号类:

常用的比较器类:

代码实现：

import Utils.HbaseUtiles;
import org.apache.hadoop.hbase.CompareOperator;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;

import java.util.Iterator;

public class demo01 {
    public static void main(String[] args) throws Exception {

        //通过自己编写的HbaseUtils工具类获取getConn方法,获取Connection
        Connection conn = HbaseUtiles.getHbaseConn();
        //获取table
        Table tb_imp = HbaseUtiles.getTable(conn, "tb_imp");
        //创建scan对象
        Scan scan = new Scan();

        //创建过滤器对象
        QualifierFilter filter = new QualifierFilter(CompareOperator.EQUAL, new BinaryComparator("name".getBytes()));
        //参数一为--比较符号(等于/大于/大于等于等)
        //参数二为--比较器模式(二进制比较器/正则/字符串/长整型等等)
        ValueFilter valueFilter = new ValueFilter(CompareOperator.EQUAL, new RegexStringComparator("[z.]"));

        //创建FilterList集合,将多个过滤器添加到其中
        FilterList filterList = new FilterList();
        filterList.addFilter(filter);
        filterList.addFilter(valueFilter);

        //将过滤器集合添加到scan中,实现多重过滤
        scan.setFilter(filterList);

        //将过滤器添加到scan中,单个条件过滤
//        scan.setFilter(filter);
        //通过scan获取浏览对象
        ResultScanner results = tb_imp.getScanner(scan);
        Iterator<Result> iterator = results.iterator();
        //迭代器遍历浏览
        while (iterator.hasNext()) {
            Result next = iterator.next();
            HbaseUtiles.showResult(next);
        }
    }
}

3.编写协处理器

public class SecondrayIndex implements RegionCoprocessor, RegionObserver {
	/**
	 * 可选择的协处理器的类型
	 */
	public Optional<RegionObserver> getRegionObserver() {
		return Optional.of(this);
	}

	/**
	 * put之前操作
	 */
	  @Override
    public void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability) throws IOException {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","linux01:2181,linux02:2181,linux03:2181");
        Connection conn = ConnectionFactory.createConnection(conf);
        Table table = conn.getTable(TableName.valueOf("ids_index"));
        List<Cell> cells = put.get("f".getBytes(), "ids".getBytes());
        for (Cell cell : cells) {
            byte[] bytes = CellUtil.cloneValue(cell);
            byte[] rowBytes = CellUtil.cloneRow(cell);
            String ids = new String(bytes);
            String[] split = ids.split(":");
            ArrayList<Put> puts = new ArrayList<>();
            for (String id : split) {
                Put p = new Put(id.getBytes());
                p.addColumn("f".getBytes(),"gid".getBytes(),rowBytes);
                puts.add(p);
            }
            table.put(puts);
        }
        table.close();
        conn.close();
    }
	/**
	 * 开启region
	 */
	@Override
	public void start(CoprocessorEnvironment env) throws IOException {
	}
	/**
	 * 关闭region
	 */
	@Override
	public void stop(CoprocessorEnvironment env) throws IOException {
	}
}

阿宾571

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop06---Hbase基础

HBASE基础region核心知识：表的行范围数据，将一张大的表格划分成多个region，将region分配给不同的regionserver及其管理分布式数据库region中有：store — 一个列族对应一个storememorystore — 写数据的内存对象，对整个hfile中的数据排序WALG 记录用户的操作行为storefile 内存对象flush到hdfs中形成hfile文件，storefile就是hfile的抽象对象blockCache 提升查询效率names
复制链接

扫一扫