HBase杂谈

最新推荐文章于 2024-07-17 14:35:09 发布

s127838498

最新推荐文章于 2024-07-17 14:35:09 发布

阅读量197

点赞数

分类专栏：大数据文章标签： HBase 部署操作大数据

本文链接：https://blog.csdn.net/s127838498/article/details/84034966

版权

大数据专栏收录该内容

13 篇文章 0 订阅

订阅专栏

1、Web Console
端口：新的：16010
老版本：60010

结构示例：

HBase 是一种列式数据库，主从结构HMaster 和 RegionServer

NoSQL数据库简介

1、什么是NoSQL数据库？not only sql
    （*）一般来说：NoSQL不支持事务
 
 
2、常见的NoSQL数据库
    （*）HBase
    （*）Redis：基于内存的NoSQL数据库，前身：MemCached（缺点：不支持持久化）
                支持持久化：RDB、AOF
                Codis是Redis分布式解决方案
    （*）MongoDB：基于文档型（BSON文档）的NoSQL数据库
                  举例：设计一个数据库来保存电影的信息
                  从MongoDB 4.0开始：支持事务
                  支持：数据的分布式存储
                        MapReduce（是JavaScript程序）
    （*）Cassandra：跟HBase类似

搭建HBase的环境

准备：

解压   tar -zxvf hbase-1.3.1-bin.tar.gz -C ~/training/
设置环境变量 vi ~/.bash_profile
    HBASE_HOME=/root/training/hbase-1.3.1
    export HBASE_HOME

    PATH=$HBASE_HOME/bin:$PATH
    export PATH

1、本地模式

    特点：不需要HDFS，把数据直接存在操作系统Linux
    配置：
    hbase-env.sh
             28 export JAVA_HOME=/root/training/jdk1.8.0_144
     
    hbase-site.xml
        <!--HBase数据保存的目录-->
        <property>
             <name>hbase.rootdir</name>
             <value>file:///root/training/hbase-1.3.1/data</value>
        </property>
         
    启动HBase:  start-hbase.sh  ----> 只会启动HMaster

2、伪分布模式：一个ZK、一个HMaster、一个RegionServer

    hbase-env.sh
            129 export HBASE_MANAGES_ZK=true  使用HBase自带的ZooKeeper
     
    hbase-site.xml
        <!--HBase对应的HDFS目录-->
        <property>
             <name>hbase.rootdir</name>
             <value>hdfs://192.168.157.111:9000/hbase</value>
        </property>       

        <!--是一个分布式环境-->
        <property>
             <name>hbase.cluster.distributed</name>
             <value>true</value>
        </property>   

        <!--指定ZK的地址-->
        <property>
             <name>hbase.zookeeper.quorum</name>
             <value>192.168.157.111</value>
        </property>       

        <property>
             <name>dfs.replication</name>
             <value>1</value>
        </property>   

    regionservers

搭建HBase的全分布环境和HA

1、搭建HBase的全分布:
    bigdata112  bigdata113  bigdata114
     
    注意：时间同步
          如果不同步：（1）Hadoop：执行MR出错
                      （2）HBase：RegionServer会自动停止
 
    bigdata112:  HMaster、ZooKeeper
    bigdata113： RegionServer
    bigdata114： RegionServer
     
    在主节点上 
        解压  tar -zxvf hbase-1.3.1-bin.tar.gz -C ~/training/
        设置环境变量
    hbase-site.xml
        <!--HBase对应的HDFS目录-->
        <property>
             <name>hbase.rootdir</name>
             <value>hdfs://192.168.157.112:9000/hbase</value>
        </property>       

        <!--是一个分布式环境-->
        <property>
             <name>hbase.cluster.distributed</name>
             <value>true</value>
        </property>   

        <!--指定ZK的地址-->
        <property>
             <name>hbase.zookeeper.quorum</name>
             <value>192.168.157.112</value>
        </property>       

        <property>
             <name>dfs.replication</name>
             <value>2</value>
        </property>
         
        <!--允许集群各个节点的时间误差的最大值，单位是毫秒-->
        <property>
             <name>hbase.master.maxclockskew</name>
             <value>180000</value>
        </property>

    regionservers
        bigdata113
        bigdata114
         
    把安装目录复制到从节点上
     scp -r hbase-1.3.1/ root@bigdata113:/root/training
     scp -r hbase-1.3.1/ root@bigdata114:/root/training

命令行：举几个例子（补充点知识：SQL的知识）

    （*）创建表： create 'student','info','grade'
         查看表:  list
                  MySQL: show tables;
                  Oracle: select * from tab;
                   
        查看表结构：desc 'student'
                    describe 'student'
                     
        SQL中（Oracle）：desc和describe什么区别？ 都是看表结构
                        (*) desc 是SQL*PLUS语句，可以缩写
                        (*) describe是SQL语句，不能缩写
                         
    （*）插入数据：put
    （*）查询数据：
            scan   相当于  select * from student
            get    相当于  select * from student where rowkey=???
             
            为了加快查询的速度，可以建立HBase的二级索引
 
    （*）清空表数据：truncate
        日志：
            hbase(main):005:0> truncate 'student'
            Truncating 'student' table (it may take a while):
             - Disabling table...
             - Truncating table...
            0 row(s) in 3.9740 seconds
         
        老版本的HBase使用truncate
        日志：
            hbase(main):005:0> truncate 'student'
            Truncating 'student' table (it may take a while):
             - Disabling table...
             - Dropping table...
             - Creating table
            0 row(s) in 3.9740 seconds  

        补充一个知识：delete和truncate什么区别？（以Oracle为例）
            1、delete是DML（Data Manipulation Language）语句，DML可以回滚
               truncate是DDL（Data Definition Language）语句，DDL不可以回滚
            2、delete会产生碎片、truncate不会
            3、delete不会释放空间，truncate会
            4、delete可以闪回(flashback)，truncate不可以
         
    （*）删除表：  drop
            hbase(main):007:0> disable 'student'
            0 row(s) in 2.2980 seconds

            hbase(main):008:0> drop 'student'
            0 row(s) in 1.3770 seconds

部分常用javaApi

//init 
    
    //配置ZooKeeper地址
        configuration = new Configuration();

        configuration.set("hbase.zookeeper.quorum","192.168.23.111");


//创建表

    //得到一个HBase的客户端
        HBaseAdmin hBaseAdmin = new HBaseAdmin(configuration);

        HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf("firstTable"));

        // 指定列族
        hTableDescriptor.addFamily(new HColumnDescriptor("info"));
        hTableDescriptor.addFamily(new HColumnDescriptor("grade"));

        //创建表
        hBaseAdmin.createTable(hTableDescriptor);
        // 关闭客户端
        hBaseAdmin.close();
      
      
// 添加一条数据

    //得到一个客户端
        HTable hTable = new HTable(configuration,"firstTable");

        //构造一个Put对象：一条数据(参数是rowKey)
        Put put = new Put(Bytes.toBytes("id001"));
        //指定列的值
		/*
		put.addColumn(family,   列族的名字
		               qualifier,  列的名字
		               value)   值
		*/
        put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes("michael"));

        hTable.put(put);

        hTable.close();
      
        
// 获得一条数据

    //得到一个客户端
        HTable hTable = new HTable(configuration,"firstTable");

        //构造一个Get对象
        Get get = new Get(Bytes.toBytes("id001"));

        //执行查询: 相当于  select * from mytable where rowkey=???
        Result result = hTable.get(get);

        //输出:注意：HBase中，没有数据的类型，所有类型都是二进制
        String name = Bytes.toString(result.getValue(Bytes.toBytes("info"),Bytes.toBytes("name")));
        System.out.println("---name---"+name);

        hTable.close();
        
        
// 普通查询

    //得到一个客户端
        HTable hTable = new HTable(configuration,"firstTable");

        //定义一个扫描器
        Scan scan = new Scan();
        //过滤器：scan.setFilter(filter)

        ResultScanner resultScanner = hTable.getScanner(scan);

        for (Result r:resultScanner) {
            //输出:注意：HBase中，没有数据的类型，所有类型都是二进制
            String name = Bytes.toString(r.getValue(Bytes.toBytes("info"),Bytes.toBytes("name")));
            System.out.println("---name---"+name);
        }
        hTable.close();
        
        
        
        
        
        
// 删除表
    
    
    String tableName = "firstTable";

        //得到一个HBase的客户端
        HBaseAdmin hBaseAdmin = new HBaseAdmin(configuration);

        HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf(tableName));

        // 指定列族
        hTableDescriptor.addFamily(new HColumnDescriptor("info"));
        hTableDescriptor.addFamily(new HColumnDescriptor("grade"));

        //使表不可用
        hBaseAdmin.disableTable(TableName.valueOf(tableName));
        // 删除表
//        hBaseAdmin.truncateTable(TableName.valueOf(tableName),false);
        hBaseAdmin.deleteTable(TableName.valueOf(tableName));

        // 关闭客户端
        hBaseAdmin.close();
        
        
        
        
// 下面是比较高级的查询

// init

    //配置ZooKeeper地址
        configuration = new Configuration();

        configuration.set("hbase.zookeeper.quorum","192.168.23.111");

        //得到一个客户端
        hTable = new HTable(configuration,"emp");
        
        
        
        
//查询某一列

    //定义一个列值过滤器
        SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(
                Bytes.toBytes("empinfo"),//列族
                Bytes.toBytes("sal"),//列名
                CompareFilter.CompareOp.EQUAL,//比较运算符
                Bytes.toBytes(3000)); //值

        Scan scan = new Scan();
        scan.setFilter(singleColumnValueFilter);

        ResultScanner results = hTable.getScanner(scan);

        for (Result r:results) {
            String name = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            System.out.println(name);
        }

        hTable.close();
        
        
 // 前缀过滤器
 
    //定义列名前缀过滤器
        ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter(Bytes.toBytes("ename"));

        Scan scan = new Scan();
        scan.setFilter(columnPrefixFilter);

        ResultScanner results = hTable.getScanner(scan);

        //查询数据：结果中只有员工的姓名

        for (Result r:results) {
            //获取姓名、薪水
            String name = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
            System.out.println(name+"\t"+sal);
        }

        hTable.close();
        
        
        
// 多个列名前缀过滤器

    //定义一个二维的字节数据，代表多个列名
        byte[][] prefixes = {Bytes.toBytes("ename"),Bytes.toBytes("sal")};

        //定义多个列名前缀过滤器，查询：姓名和薪水
        MultipleColumnPrefixFilter multipleColumnPrefixFilter = new MultipleColumnPrefixFilter(prefixes);

        Scan scan = new Scan();
        scan.setFilter(multipleColumnPrefixFilter);

        ResultScanner results = hTable.getScanner(scan);

        //查询数据：结果中只有员工的姓名

        for (Result r:results) {
            //获取姓名、薪水
            String name = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
            System.out.println(name+"\t"+sal);
        }

        hTable.close();
        
        
        
 // 使用行过滤器
 
    // 定义一个RowFilter
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,//比较运算符
                new RegexStringComparator("7839"));//行键的值：使用的一个正则表达式

        Scan scan = new Scan();
        scan.setFilter(rowFilter);

        ResultScanner results = hTable.getScanner(scan);


        for (Result r:results) {
            //获取姓名、薪水
            String name = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
            System.out.println(name+"\t"+sal);
        }

        hTable.close();
        
        
// 组合过滤器

    //组合两个过滤器
        //举例：查询工资等于3000的员工姓名
        /*
         * 第一个过滤器：列值过滤器
         * 第二个过滤器：列名前缀过滤器
         */

        //定义一个列值过滤器
        SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(
                Bytes.toBytes("empinfo"),//列族
                Bytes.toBytes("sal"),//列名
                CompareFilter.CompareOp.EQUAL,//比较运算符
                Bytes.toBytes(3000)); //值


        //定义列名前缀过滤器
        ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter(Bytes.toBytes("ename"));
        //创建一个FilterList
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);// and
//        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);// or

        filterList.addFilter(singleColumnValueFilter);
        filterList.addFilter(columnPrefixFilter);

        Scan scan = new Scan();
        scan.setFilter(filterList);

        ResultScanner results = hTable.getScanner(scan);


        for (Result r:results) {
            //获取姓名、薪水
            String name = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
            System.out.println(name+"\t"+sal);
        }

        hTable.close();

注意

数据的存储，如果数据太大，一个Region存储不下，需要多个Region来存储时，会发生Region的分裂，会产生大量的拷贝操作，这里如果是全分布环境，就需要考虑，网络带宽是否承受得了压力（分裂时Region一般在不会拷贝到本节点，所以需要通过网络来传输），同样，hdfs执行blancer 时，也会有同样的问题

export HADOOP_CLASSPATH= $HBASE_HOME/lib/*:$ CLASSPATH

s127838498

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HBase杂谈

1、Web Console端口：新的：16010老版本：60010结构示例：HBase 是一种列式数据库，主从结构HMaster 和 RegionServerNoSQL数据库简介1、什么是NoSQL数据库？not only sql （*）一般来说：NoSQL不支持事务 2、常见的NoSQL数据库（*）HBase （*）Redis：基于内存的NoSQL数...
复制链接

扫一扫