HBase详细解读之二_hbase 创表三个字段-CSDN博客

本文链接：https://blog.csdn.net/zhijunming/article/details/108189781

4、Hbase优化

1、预分区[在创建表的的时候多创建几个region]:

原因: 默认情况下，创建表的时候只有一个region,前期所有的请求都会落在这一个region上,会对region所在的regionserver造成请求压力

1、shell

	1、create '表名','列簇名',SPLITS=>['rowkey1','rowkey2',..]
		create 'person','f1',SPLITS=>['10','11'] 
		此时会创建三个region。
		第一个region的起始rowkey是'',结束rowkey是'10'
		第二个region的起始rowkey是'10',结束rowkey是'11'
		第三个region的起始rowkey是'11',结束rowkey是''
	2、create '表名','列簇名',SPLITS_FILE=>'文件路径'

2、api

final byte[][] splitKeys = {"10".getBytes(), "20".getBytes(), "30".getBytes()};
admin.createTable(tableDescriptor,splitKeys);

2、rowkey的设计

rowkey的原则:

1、长度原则: rowkey的长度不要太长
	rowkey太长会占用过多的磁盘空间
	rowkey过长会导致client端缓存的元数据会比较少
2、唯一原则: 每条数据的rowkey必须唯一
3、hash原则: 让数据不会产生倾斜

热点问题的解决方案：

1、字符串的反转
2、hash
3、加盐-加随机数

3、内存优化：

hbase分配的内存不要太多，16-40G比较合适

4、基础优化

	1、让HDFS能够追加
	2、调整监听client的PRC的个数
	3、超时时间的配置
	4、flush的时候10G配置的调整
	5、flush、compact、split的参数配置
	6、最大文件打开数
	7、客户端缓存
	8、写入效率-压缩

5、phoenix

1、phoenix shell操作

1、查看所有表: !tables
2、创建表:
	1、Hbase中没有表[会在hbase中同步创建表]
		create table 表名(
			字段名 字段类型 primary key,
			字段名 字段类型,
			字段名 字段类型,
			....
	[constraint 主键名称 on(字段1,字段2)] //组合主键，是上面没有指定primary key的时候使用
	)COLUMN_ENCODED_BYTES=0
	COLUMN_ENCODED_BYTES=0:代表不对字段进行编码
	在创建表的时候，会同步的在hbase中创建表
	表名会自动的转成大写，如果想要保持小写，需要通过""括起来
   2、Hase中有表[在phoenix中建表建立映射]
	  create table 表名(
		 "列簇"."字段名" 字段类型 primary key,
		 "列簇"."字段名" 字段类型, //如果hbase中列簇与列限定符为小写，需要用""括起来
				字段名 字段类型,
		  )COLUMN_ENCODED_BYTES=0
3、创建视图[视图只能查看数据,不能修改/删除/插入数据]
				create view 视图名(
					字段名 字段类型 primary key,
					"列簇"."字段名" 字段类型, //如果hbase中列簇与列限定符为小写，需要用""括起来
					字段名 字段类型,
					)COLUMN_ENCODED_BYTES=0
4、删除表与视图
	1、删除表: drop table 表名
		删除表的时候会同步删除hbase的表
	2、删除视图: drop view 视图名
		删除视图的时候不会删除hbase的表
5、插入数据: upsert into 表名 values(值,...)
6、查询数据: select * from 表名

2、jdbc

package com.atguigu;


import org.apache.phoenix.queryserver.client.ThinClientUtil;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.sql.*;

public class PhoenixJdbc {

    private static Connection connection;
    private static PreparedStatement statement;
    /**
     * 初始化
     * @throws Exception
     */
    @Before
    public void init() throws Exception {
        // 1 加载驱动
        Class.forName("org.apache.phoenix.queryserver.client.Driver");
        // 2 获取connection连接
        final String url = ThinClientUtil.getConnectionUrl("hadoop102", 8765);
        connection = DriverManager.getConnection(url);
        // 设置自动提交
     //  connection.setAutoCommit(true);
    }

    /**
     * 关闭资源
     * @throws SQLException
     */
    @After
    public void close() throws SQLException {
        if(statement!=null) statement.close();
        if(connection!=null) connection.close();
    }

    /**
     * 创建表
     * @throws Exception
     */
    @Test
    public void createTable() throws Exception {
        // 1加载驱动
        Class.forName("org.apache.phoenix.queryserver.client.Driver");
        // 2获取connection对象
        // 2.1 设置url
        final String url = ThinClientUtil.getConnectionUrl("hadoop102", 8765);
        // 2.2connection
        final Connection connection = DriverManager.getConnection(url);
        // 3获取Statement对象
        String sql= "create table  if not exists user03" +
                "( id varchar primary key,  name varchar, age varchar) " +
                "COLUMN_ENCODED_BYTES=0";
        final PreparedStatement statement = connection.prepareStatement(sql);
        // 执行sql
        statement.execute();
        // 4关闭
        statement.close();
        connection.close();
    }

    /**
     * 插入&更改数据
     * @throws Exception
     */
    @Test
    public  void upsertValue() throws Exception {
        // 1加载驱动
        Class.forName("org.apache.phoenix.queryserver.client.Driver");
        // 2获取connection
        // 2.1 设置url
        final String url = ThinClientUtil.getConnectionUrl("hadoop102", 8765);
        final Connection connection = DriverManager.getConnection(url);

        // 获取statement对象
            // 3.1 编写sql
        String sql="upsert into person values(?,?,?)";
        final PreparedStatement statement = connection.prepareStatement(sql);
        statement.setString(1,"1002");
        statement.setString(2,"lisi");
        statement.setString(3,"26");
           //3.2执行sql
        statement.execute();
            // 3.3 手动提交
        connection.commit();
        // 关闭
        statement.close();
        connection.close();
    }

    /**
     * 批量插入数据batch
     * @throws SQLException
     */
    @Test
    public void upsertBatch() throws SQLException {
        // 编写sql
        // upsert into person values(?,?,?)
        String sql="upsert into user03 values(?,?,?)";
        // 获取statement对象
        statement=connection.prepareStatement(sql);
        for (int i = 2; i < 50; i++) {
            statement.setString(1,"100"+i);
            statement.setString(2,"wangwu"+i);
            statement.setString(3,"2"+i);
            statement.addBatch();
            if(i%10==0){
                statement.executeBatch();
                statement.clearBatch();
                connection.commit();
            }
        }
        statement.executeBatch();
        connection.commit();
        System.out.println("插入成功");
    }

    /**
     * 查询数据
     * @throws SQLException
     */
    @Test
    public void showData() throws SQLException {
        String sql="select * from person";
        statement=connection.prepareStatement(sql);
        final ResultSet resultSet = statement.executeQuery();
        while (resultSet.next()){
            final String id = resultSet.getString(1);
            final String name = resultSet.getString(2);
            final String age = resultSet.getString(3);
            System.out.println("id:"+id+" name:"+name+" age:"+age);
        }
    }
    /**
     * 删除数据
     * @throws SQLException
     */
    @Test
    public void deleteData() throws SQLException {
        // 编写sql语句
        String sql="delete from person where id=?";
        // 获取statement对象
        statement=connection.prepareStatement(sql);
        statement.setString(1,"1001");
        // 运行sql语句 删除数据
        statement.executeUpdate();
    }

    /**
     * 删除表
     * @throws SQLException
     */
    @Test
    public void dropTable() throws SQLException {
        // 编写sql
        String sql="drop table person";
        // 创建statement对象
        statement=connection.prepareStatement(sql);
        // 执行sql
        statement.execute();

    }

}

3、协处理器

用途: 在向hbase表插入/删除/查询之前或者之后自动干什么事情

1、编程class实现RegionObserver、RegionCoprocessor 
2、重写方法：
		  @Override
		  public Optional<RegionObserver> getRegionObserver() {
			return Optional.of(this);
		  }
3、重新动作方法[在put/get/scan/delete之前[prePut/Get/Delete]或者put/get/scan/delete之后做操作[postPut/Get/Delete]]
4、打包上传HDFS
5、禁用表
6、修改表，给表加载协处理器
			alter '表名', METHOD => 'table_att', 'Coprocessor'=>'jar所处HDFS路径| 协处理器所在的全类名|优先级|参数'
7、启动表: enable
8、通过动作[put/get/scan/delete]查看协处理器是否正常

4、二级索引

二级索引的实现主要是通过协处理器完成

原因: 
hbase中查询速度快主要是因为可以用待查询数据的rowkey通过meta元数据找到rowkey处于哪个region，
region处于regionserver.但是如果想要根据value值查询的是，没办法通过meta元数据找到数据处于哪个region，哪个regionserver，所以只能全部扫描，效率低

1、全局二级索引

1、原理: 将建立索引的字段的值与原来的rowkey组成新的rowkey,将include中包含的字段名作为列限定符,将include中包含的字段值作为新值，将新rowkey、列限定符、新值放入新创建的索引表中。
  查询的时候优先查询索引表，如果索引表不满足，则继续查找原表[全部扫描]
2、创建: create index 索引名 on 命名空间名称.表名(字段名,..) [include(字段名,..)]
	//如果表是映射表，则字段名必须用: 列簇名.列名
3、删除: drop index 索引名 on 命名空间名称.表名

2、本地二级索引

1、原理: 
	在原表中插入数据，插入的数据rowkey=__索引字段值_.._原来的rowkey
	后续根据字段值查询的时候，首先查新rowkey,得到原来的rowkey，再通过原来的rowkey查询原始数据
2、创建: create local index 索引名 on 命名空间名称.表名(字段名,..)
3、删除: drop index 索引名 on 命名空间名称.表名

6、hbase与hive整合

1、内部表

[在hive建表的时候，会同步在hbase创建表，如果hbase的表已经存在，则报错]

   CREATE TABLE hive_hbase_emp_table(
		字段名 字段类型,....)
	STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'-- 关联HBase
	WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,列簇名:列名,...")--关联HBase表列限定符
	TBLPROPERTIES ("hbase.table.name" = "hbase表名");-- 关联Hbase表

CREATE TABLE hive_hbase_emp_table(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

案例一

1）在Hive中创建表同时关联HBase

CREATE TABLE hive_hbase_emp_table(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

提示：完成之后，可以分别进入Hive和HBase查看，都生成了对应的表

2）在Hive中创建临时中间表，用于load文件中的数据

提示：不能将数据直接load进Hive所关联HBase的那张表中

CREATE TABLE emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
row format delimited fields terminated by '\t';

3）向Hive中间表中load数据

hive> load data local inpath '/home/admin/softwares/data/emp.txt' into table emp;

4）通过insert命令将中间表中的数据导入到Hive关联Hbase的那张表中

hive> insert into table hive_hbase_emp_table select * from emp;

5）查看Hive以及关联的HBase表中是否已经成功的同步插入了数据

Hive：
hive> select * from hive_hbase_emp_table;

HBase：
hbase> scan 'hbase_emp_table'

2、外部表

[在hive建表的时候，不会同步在hbase创建表，如果hbase的表不存在，则报错]

CREATE EXTERNAL TABLE hive_hbase_emp_table(
   字段名 字段类型, ....)
	STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
	WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,列簇名:列名,...")
	TBLPROPERTIES ("hbase.table.name" = "hbase表名");

案例

目标：在HBase中已经存储了某一张表hbase_emp_table，然后在Hive中创建一个外部表来关联HBase中的hbase_emp_table这张表，使之可以借助Hive来分析HBase这张表中的数据。

注：该案例2紧跟案例1的脚步，所以完成此案例前，请先完成案例1。

分步实现：

1）在Hive中创建外部表

CREATE EXTERNAL TABLE relevance_hbase_emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

2）关联后就可以使用Hive函数进行一些分析操作了

hive (default)> select * from relevance_hbase_emp;