HBase知识点总结

最新推荐文章于 2021-03-03 15:50:12 发布

「miraitowa」

最新推荐文章于 2021-03-03 15:50:12 发布

阅读量632

点赞数

分类专栏： HBase 文章标签： hbase 大数据

本文链接：https://blog.csdn.net/weixin_45557389/article/details/107695496

版权

HBase 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、HBase概述

1.1 HBase简介（Hadoop DataBase）

HBase是一种分布式，可扩展，支持海量数据存储的NoSql数据库；是高可靠性，高性能，面向列，可伸缩，实时读写的分布式数据库。

利用Hadoop HDFS作为其文件存储系统；

利用Hadoop MapReduce来处理HBase中的海量数据；

利用Zookeeper作为其分布式协同服务；
主要用来存储非结构化和半结构化的松散数据

结构化：列固定，值不能随便写；

非结构化：列不固定，内容随便写；

半结构化：没有硬性规定，写文档注意格式，如果格式不对，也不报错；

ETL：数据清洗，就是把非结构化数据变成结构化数据。
核心包括三部分

rowkey：键

列族：列，值

时间戳：版本
- rowkey的设计原则：
  
  长度：限制64k(3个汉字)，建议越短越好，16字节以内；过长时，如果数据量很大，会导致内存占用量过多，检索变慢；
  
  唯一性
- 热点问题：散列的方案
  
  Hash：可以让数据均衡分配，可重构，可以使用get等方式快速访问数据；
  
  加随机数：可以让数据均衡分配，不可重构，失去get快速定位数据等能力；
  
  反转：普通使用的散列方法，尤其是对于时间序列。
二级索引

是HBase后期添加的索引类型，用来解决非rowkey数据访问场景，支持btree索引，位图索引；但HBase二级索引类型与mysql等普通索引不同。

1.2 体系架构

StoreFile：HBase的角度；HFile：是hadoop的角度；两个是一回事。

在这里插入图片描述

Client

包含访问HBase的接口并维护cache来加快对HBase的访问Zookeeper；

保证任何时候，集群中只有一个Master；

存储所有Region的寻址入口；

实时监控Region server的上线和下线信息，并实时通知Master；

存储HBase的schema和table元数据。
Master = NameNode

为Region server分配region；

负责Region server的负载均衡；

发现失效的Region server并重新分配其它的region；

管理用户对table的增删改操作。
RegionServer = DataNode

Region server维护region，处理对这些region的IO请求；

Region server负责切分在运行过程中变得过大的region。
Region

HBase自动把表水平划分成多个区域(region)，每个region会保存一个表里面某段连续的数据；每个表一开始只有一个region****，随着数据不断插入表，region不断增大，当增大到一个阀值**的时候，region就会等分会两个新的region（裂变）；

当table中的行不断增多，就会有越来越多的region。这样一张完整的表被保存在多个Regionserver 上。
Memstore 与 StoreFile

一个region由多个store组成，一个store对应一个CF（列族）；

Store包括位于内存中的memstore和位于磁盘的storefile写操作。

二、配置HBase（高可用）

	node7-1	node7-2	node7-3	node7-4
Zookeeper	√	√	√
HMaster NameNode			√
Backup Master	√	√
RegionServer DataNode	√	√		√

先启动zookeeper和hadoop，免密钥
修改配置文件（conf/hbase-env.xml）

修改配置文件（conf/hbase-site.xml）

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- hadoop的数据存储位置 -->
    <property>
        <name>hbase.rootdir</name>
        <!-- hdfs的路径,得是active -->
        <value>hdfs://node7-1:8020/hbase/data</value>
    </property>
    <!-- 
    zookeeper的目录
    hbase必须得有一个zookeepr;
    hbase自带了一个zookeeper
    -->
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/data/hbase/zookeeper</value>
    </property>
    <!-- hbase自带的一个安全机制 -->
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
    
    <!-- zookeeper的相关配置 -->
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>node7-1:2181,node7-2:2181,node7-3:2181</value>
    </property>
    <!-- 开启Hbase的集群模式 -->
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
</configuration>

添加配置文件（conf/regionservers）
```
node7-1
node7-2
node7-4
```
添加配置文件（conf/backup-masters）(一定要用原来的配置文件复制一份(regionservers))
```
node7-2
node7-1
```
启动服务器（master）

bin/start-hbase.sh
访问网页

node7-3：16010/master-status
启动客户端

bin/hbase shell
停止服务器

bin/stop-hbase.sh

三、HBase Shell操作

帮助命令

help
查看当前数据库有哪些表

list
创建表

create 'student', 'cf'
插入数据

put 'student', '1001', 'cf=sex', 'boy'
扫描查看表数据

scan 'student'

scan 'student', {STARTROW => '1001', STOPROW => '1005'}

scan 'student', {STARTROW => '1001'}
查看表结构

describe 'student'
更新指定字段数据

put 'student', '1001', 'cf:name', 'Nick'
查看指定行/列族：列的数据

get 'student', '1001'

get 'student', '1001', 'cf:name'
统计表数据行数

count 'student'
删除数据

某rowkey全部：deleteall 'student', '1001'

某rowkey某一列：delete 'student', '1002', 'cf:name'
清空表数据

truncate 'student'

清空表，先disable，再truncate
删除表

先禁用表：disable 'student'

再删除表：drop 'student'

如果直接drop：ERROR:Tbale student is enabled Disable it first.
变更表信息：将cf列族中的数据存放3个版本

alter 'student', {NAME => 'cf', VERSIONS => 3}

get 'student', '1001', {COLUMN => 'cf:name', VERSIONS => 3}

四、Java连接HBase

创建maven项目

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.NamespaceDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.util.List;

public class HBaseTest {
    private Connection connection = null;
    private Admin admin = null;
    
    // 表名
    private String tableName = "mydata:psn_2";
    // 列族
    private String cf = "cf";
    
    /**
     * 初始化
     */
    @Before
    public void init() {
        // 需要一个hadoop，或者配置环境变量
        System.setProperty("hadoop.home.dir", "D:/jinghang/hadoop");
        try {
            // 创建一个配置类
            Configuration conf = new Configuration();
            /*
            就是把xxx-site.xml中的配置文件重新加载一次或者配置一次
            要求符合hadoop的规则
            configuration --> property --> name和values
            参数1：键；property的name
            参数2：值；property的values
             */
            conf.set("hbase.zookeeper.quorum", "node7-1:2181,node7-2:2181,node7-3:2181");

            // 方法2：直接把配置文件贴到classpath中
            // 创建一个链接
            connection = ConnectionFactory.createConnection(conf);
            // 获取一个管理员对象
            admin = connection.getAdmin();
            System.out.println("---初始化---admin:" + admin + ";connection:" + connection);
        } catch (IOException e) {
            e.printStackTrace();
        }
    
    }
    
    /**
     * list命令
     */
    @Test
    public void list() {
        try {
            // 调用的是list命令
            List<TableDescriptor> tableDescriptorList = admin.listTableDescriptors();
            // 循环
            int count = 1;
            for (TableDescriptor descriptor : tableDescriptorList) {
                System.out.println(count + "---list---" + descriptor.getTableName());
                count ++;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    /**
     * list_namespace
     */
    @Test
    public void listNamespace() {
        try {
            NamespaceDescriptor[] descriptors = this.admin.listNamespaceDescriptors();
            // 循环
            int count = 1;
            for (NamespaceDescriptor descriptor : descriptors) {
                System.out.println(count + "--listNamespace--" + descriptor.getName());
                count ++;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    /**
     * 创建一张表
     * create 'mydata:psn_2','cf'
     */
    @Test
    public void create() {
        try {
            // 表的描述类
            TableDescriptorBuilder tdb = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName));
            // 增加列族，列族的描述
            // 只有一个列族
            ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder.newBuilder(this.cf.getBytes("UTF-8")).build();
            tdb.setColumnFamily(cfd);
            // 获取表的描述
            TableDescriptor descriptor = tdb.build();
            // 创建一张表
            this.admin.createTable(descriptor);
            System.out.println("---创建成功---");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    /**
     * 插入记录
     */
    @Test
    public void put() {
        Table table = null;
        try {
            // 获取对象
            table = this.connection.getTable(TableName.valueOf(this.tableName));
            // 参数是rowkey
            // Put put = new Put("dynasty01".getBytes("UTF-8"));
            /*
            增加列和值
            参数1：列族
            参数2：名字
            参数3：值
            所有的参数要的是字节数组
             */
            // put.addColumn(this.cf.getBytes("UTF-8"),"name".getBytes("UTF-8"),"唐朝".getBytes("UTF-8"));
            // put.addColumn(this.cf.getBytes("UTF-8"),"age".getBytes("UTF-8"),"289".getBytes("UTF-8"));
            // put.addColumn(this.cf.getBytes("UTF-8"),"capital".getBytes("UTF-8"),"长安".getBytes("UTF-8"));
    
            Put put = new Put("dynasty02".getBytes("UTF-8"));
            put.addColumn(this.cf.getBytes("UTF-8"),"name".getBytes("UTF-8"),"宋朝".getBytes("UTF-8"));
            put.addColumn(this.cf.getBytes("UTF-8"),"age".getBytes("UTF-8"),"319".getBytes("UTF-8"));
            put.addColumn(this.cf.getBytes("UTF-8"),"capital".getBytes("UTF-8"),"汴梁".getBytes("UTF-8"));

            // 可以调用put方法
            table.put(put);
            System.out.println("---插入成功---");
    
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (table != null) {
                    table.close();
                    table = null;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    /**
     * scan：扫描
     */
    @Test
    public void scan() {
        Table table = null ;
        try {
            // 获取表对象
            table = this.connection.getTable(TableName.valueOf(this.tableName));
            // 搜索条件
            Scan scan = new Scan();
            // 查询所有记录
            ResultScanner scanRes = table.getScanner(scan);
            int count = 1 ;
            // iterator:迭代器
            for(Result result : scanRes)
            {
                // rowKey
                String rowKey = new String(result.getRow(),"UTF-8") ;
                String name = new String(result.getColumnLatestCell(this.cf.getBytes("UTF-8"),"name".getBytes("UTF-8")).getValueArray()) ;
                String age = new String(result.getColumnLatestCell(this.cf.getBytes("UTF-8"),"age".getBytes("UTF-8")).getValueArray()) ;
                String capital = new String(result.getColumnLatestCell(this.cf.getBytes("UTF-8"),"capital".getBytes("UTF-8")).getValueArray()) ;
                System.out.println("条数:"+ count +";rowKey:" + rowKey + ";name:" + name + ";age:" + age + ";capital:" + capital);
                count ++ ;
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if(table != null)
                {
                    table.close();
                    table = null ;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    /**
     * 通过键找值
     */
    @Test
    public void get() {
        Table table = null;
        try {
            // 获取表对象
            table = this.connection.getTable(TableName.valueOf(this.tableName));
            Get get = new Get("dynasty01".getBytes("UTF-8"));
            Result result = table.get(get);
            // rowKey
            String rowKey = new String(result.getRow(), "UTF-8");
            // 最新的记录
            Cell cell = result.getColumnLatestCell(this.cf.getBytes("UTF-8"), "name".getBytes("UTF-8"));
            String name = new String(Bytes.toStringBinary(cell.getValueArray()));
        
            String age = new String(result.getColumnLatestCell(this.cf.getBytes("UTF-8"), "age".getBytes("UTF-8")).getValueArray());
            String capital = new String(result.getColumnLatestCell(this.cf.getBytes("UTF-8"), "capital".getBytes("UTF-8")).getValueArray());
            System.out.println("rowKey:" + rowKey + ";name:" + name + ";age:" + age + ";capital:" + capital);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (table != null) {
                    table.close();
                    table = null;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    /**
     * 销毁(关闭)
      */
    @After
    public void after() {
        System.out.println("==销毁==");
        try {
            if(admin != null) {
                admin.close();
                admin = null ;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        try {
            if(connection != null) {
                connection.close();
                connection = null ;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}