HBase Quick Start

最新推荐文章于 2023-05-05 23:24:45 发布

sean-zou

最新推荐文章于 2023-05-05 23:24:45 发布

阅读量877

点赞数

分类专栏： Big Data 文章标签： HBase

本文链接：https://blog.csdn.net/a19881029/article/details/70148978

版权

Big Data 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Linux Distribution：Ubuntu 14.04.4

HBase：1.3.0

JDK：1.7.0_80

一，下载、安装HBase

在官网可以找到清华的镜像资源http://mirrors.tuna.tsinghua.edu.cn/apache/hbase/，其他的资源下载速度非常慢

下载最新的1.3.0版本，之后解压即可

root@ubuntu:/home/sean# tar -xzf hbase-1.3.0-bin.tar.gz
root@ubuntu:/home/sean# cd hbase-1.3.0

二，修改HBase配置

root@ubuntu:/home/sean/hbase-1.3.0# cd conf

1，修改HBase使用的JDK

root@sean:/home/sean/hbase-1.3.0/conf# vi hbase-env.sh

添加如下配置

export JAVA_HOME=/home/sean/jdk1.7.0_80

2，修改HBase的配置文件（单机启动）

root@sean:/home/sean/hbase-1.3.0/conf# vi hbase-site.xml

添加如下配置(HBase和Zookeeper的数据文件保存位置)

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/sean/hbaseSingle/hbaseData</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/sean/hbaseSingle/zookeeperData</value>
  </property>
</configuration>

三，修改系统配置

root@sean:/home/sean/hbase-1.3.0/conf# hostname
sean

修改hosts文件

这一步十分重要，如果不修改，将导致Java客户端卡主，既不报错，也不执行，最终超时，应该是源代码之中使用了各种DNS正解、反解

修改前文件内容如下(需要注意Ubuntu中主机名对应的IP地址为127.0.1.1)

root@sean:/home/sean/hbase-1.3.0/conf# cat /etc/hosts
127.0.0.1	localhost
127.0.1.1	sean

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

修改后文件内容如下

127.0.0.1	localhost
192.168.137.128       sean

# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

四，启动HBase

特别需要注意的是start-hbase.sh的第51行有些问题

if [ "$distMode" == 'false' ]

应该修改为

if [ "$distMode" = 'false' ]

启动结果如下

root@sean:/home/sean/hbase-1.3.0/conf# cd ../bin
root@sean:/home/sean/hbase-1.3.0/bin# sh start-hbase.sh 
starting master, logging to /home/sean/hbase-1.3.0/bin/../logs/hbase-root-master-sean.out

五，验证HBase启动成功

首先查看Java进程

root@ubuntu:/home/sean/hbase-1.3.0/bin# jps
2498 Jps
2111 HMaster

由于使用单机方式启动，所以只有一个Java进程

root@ubuntu:/home/sean/hbase-1.3.0/bin# netstat -anp | grep java | grep LISTEN
tcp6       0      0 :::42303                :::*                    LISTEN      2111/java       
tcp6       0      0 192.168.239.129:42563   :::*                    LISTEN      2111/java       
tcp6       0      0 192.168.239.129:46053   :::*                    LISTEN      2111/java       
tcp6       0      0 :::2181                 :::*                    LISTEN      2111/java       
tcp6       0      0 :::16010                :::*                    LISTEN      2111/java

其中2181是Hbase内置Zookeeper的监听端口

16010是Web GUI的监听端口，可通过http://192.168.137.128:16010查看Hbase的状态

使用HBase Shell创建一张名为test的表

root@ubuntu:/home/sean/hbase-1.3.0/bin# ./hbase shell
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.0, re359c76e8d9fd0d67396456f92bcbad9ecd7a710, Tue Jan  3 05:31:38 MSK 2017

hbase(main):001:0> list
TABLE                                                                           
0 row(s) in 0.3770 seconds

=> []
hbase(main):002:0> create 'test','cf'
0 row(s) in 1.4130 seconds

=> Hbase::Table - test
hbase(main):003:0> list
TABLE                                                                           
test                                                                            
1 row(s) in 0.0320 seconds

=> ["test"]

六，HBase Java客户端

在使用客户端之前也必须修改本地hosts文件

由于本地系统为win10，hosts文件路径为C:\Windows\System32\drivers\etc\hosts

添加内容如下

192.168.137.128 sean

POM文件中添加如下依赖

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.3.0</version>
</dependency>

1，创建表

package com.sean;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;

public class Client
{
    public static void main( String[] args )
    {
        // 配置，单机环境只需指定Zookeeper地址
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.137.128");

        try {
            Connection conn = ConnectionFactory.createConnection(conf);
            Admin admin = conn.getAdmin();

            TableName tableName = TableName.valueOf("testtable");

            // 如果表已经存在则删除旧表
            if(admin.tableExists(tableName)){
                // 首先disable表，disable 'testtable'
                admin.disableTable(tableName);
                // 然后删除表，drop 'testtable'
                admin.deleteTable(tableName);
                System.out.println("table exists，delete first！");
            }

            // 创建新表
            HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
            hTableDescriptor.addFamily(new HColumnDescriptor("cf1"));
            hTableDescriptor.addFamily(new HColumnDescriptor("cf2"));
            admin.createTable(hTableDescriptor);

        } catch (Exception e){
            System.out.println(e.getMessage());
            e.printStackTrace();
        }
        System.out.println("operation is over");
    }
}

结果如下

hbase(main):006:0> list
TABLE                                                                           
testtable                                                                       
1 row(s) in 0.0120 seconds

=> ["testtable"]

2，插入数据

package com.sean;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;

public class Insert
{
    public static void main( String[] args )
    {
        // 配置，单机环境只需指定Zookeeper地址
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.137.128");

        HTable table = null;
        try {
            TableName tableName = TableName.valueOf("testtable");
            table = new HTable(conf,tableName);

            // 单行插入
            Put put1 = new Put(Bytes.toBytes("row1"));
            put1.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val1"));
            put1.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("qual"), Bytes.toBytes("val1"));
            table.put(put1);

            // 多行插入
            List<Put> putList = new LinkedList<Put>();

            Put put2 = new Put(Bytes.toBytes("row2"));
            put2.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val2"));
            putList.add(put2);

            Put put3 = new Put(Bytes.toBytes("row3"));
            put3.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val3"));
            putList.add(put3);

            table.put(putList);

            // 多行插入，并使用客户端写缓存
            table.setAutoFlush(false,true);
            putList = new LinkedList<Put>();

            Put put4 = new Put(Bytes.toBytes("row4"));
            put4.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val4"));
            putList.add(put4);

            Put put5 = new Put(Bytes.toBytes("row5"));
            put5.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val5"));
            putList.add(put5);

            table.put(putList);
            table.flushCommits();

            // 使用batch接口批量插入数据，batch接口是同步接口，不会使用客户端写缓存
            putList = new LinkedList<Put>();

            Put put6 = new Put(Bytes.toBytes("row6"));
            put6.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val6"));
            putList.add(put6);

            Put put7 = new Put(Bytes.toBytes("row7"));
            put7.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val7"));
            putList.add(put7);

            Object[] results = new Object[putList.size()];
            table.batch(putList,results);
            for(Object obj : results)
                System.out.println(obj);

            // 检查后插入，特别需要注意的是，check和put必须是同一行，即加锁必须针对同一行
            Put put8 = new Put(Bytes.toBytes("row8"));
            put8.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual"), Bytes.toBytes("val8"));
            Boolean result = table.checkAndPut(Bytes.toBytes("row8"),Bytes.toBytes("cf1"),
                    Bytes.toBytes("qual"),null,put8);
            System.out.println(result);
        } catch (Exception e){
            System.out.println(e.getMessage());
            e.printStackTrace();
        } finally {
            if(table != null)
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            System.out.println("operation is over");
        }
    }
}

控制台输出如下，要注意的是使用batch接口时，put和delete操作成功后返回的是空

keyvalues=NONE
keyvalues=NONE
true
operation is over

执行结果如下

hbase(main):031:0> scan 'testtable'
ROW                   COLUMN+CELL                                               
 row1                 column=cf1:qual, timestamp=1492921227642, value=val1      
 row1                 column=cf2:qual, timestamp=1492921227642, value=val1      
 row2                 column=cf1:qual, timestamp=1492921227658, value=val2      
 row3                 column=cf1:qual, timestamp=1492921227658, value=val3      
 row4                 column=cf1:qual, timestamp=1492921227661, value=val4      
 row5                 column=cf1:qual, timestamp=1492921227661, value=val5      
 row6                 column=cf1:qual, timestamp=1492921227663, value=val6      
 row7                 column=cf1:qual, timestamp=1492921227663, value=val7      
 row8                 column=cf1:qual, timestamp=1492917063146, value=val8      
8 row(s) in 0.0280 seconds

3，查询数据

package com.sean;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;

public class Query
{
    public static void main( String[] args )
    {
        // 配置，单机环境只需指定Zookeeper地址
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.137.128");

        HTable table = null;
        try {
            TableName tableName = TableName.valueOf("testtable");
            table = new HTable(conf,tableName);

            // 单行查询
            Get get1 = new Get(Bytes.toBytes("row1"));
//            可以设置查询的版本数，但是首先列族必须支持保存多版本（默认仅保存1个版本）
//            describe 'testtable'    // 查看保存版本数
//            alter 'testtable',{name=>'cf1',versions=>10}    // 修改保存版本数
//            get1.setMaxVersions(10);

            Result result1 = table.get(get1);
            System.out.println(result1.toString());

            // 多行查询
            System.out.println("----");
            List<Get> list = new LinkedList<Get>();

            Get get2 = new Get(Bytes.toBytes("row1"))
                    .addFamily(Bytes.toBytes("cf1"));
            list.add(get2);

            Get get3 = new Get(Bytes.toBytes("row1"))
                    .addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("qual"));
            list.add(get3);

            Get get4 = new Get(Bytes.toBytes("row99"));
            list.add(get4);

            Result[] result2 = table.get(list);
            for(Result result : result2)
                System.out.println(result);

            // 使用batch接口批量查询
            System.out.println("----");
            list = new LinkedList<Get>();

            Get get5 = new Get(Bytes.toBytes("row4"));
            list.add(get5);

            Get get6 = new Get(Bytes.toBytes("row5"))
                    .addFamily(Bytes.toBytes("cf_not_exist"));
            list.add(get6);

            Object[] result3 = new Object[list.size()];
            try {
                table.batch(list, result3);
            } catch (Exception e){
                // do nothing
            }
            for(Object obj : result3) {
                if(obj instanceof Exception)
                    System.out.println(obj.getClass().getName());
                else
                    System.out.println(obj.toString());
            }

            // 查询指定行或指定行的前一行，如果都没有，返回null
            System.out.println("----");

            Result result4 = table.getRowOrBefore(Bytes.toBytes("row3"),Bytes.toBytes("cf1"));
            System.out.println(result4.toString());

            Result result5 = table.getRowOrBefore(Bytes.toBytes("row99"),Bytes.toBytes("cf1"));
            System.out.println(result5.toString());

            Result result6 = table.getRowOrBefore(Bytes.toBytes("a"),Bytes.toBytes("cf1"));
            if(result6 != null)
                System.out.println(result6.toString());
            else
                System.out.println("is null");

            // 使用Scan扫描表
            System.out.println("----");
//            设置表级别的扫描器缓存
//            table.setScannerCaching(4);

            Scan scan = new Scan();
            scan.addFamily(Bytes.toBytes("cf1"));
            // 起始行包含在结果之内，结束行不包含在结果之内
            scan.setStartRow(Bytes.toBytes("row1"));
            scan.setStopRow(Bytes.toBytes("row8"));
            // 设置扫描级别的扫描器缓存（优先级高于表级）
            // 设置scanner每一次从服务器查询的行数
            // 如果不设置扫描缓存，每一行查询都将使用独立的rpc请求
            scan.setCaching(4);// 总共7行数据只需发起2次rpc请求即可拿到全部数据
            // 设置每次取回的列数
            scan.setBatch(1);

            ResultScanner scanner = table.getScanner(scan);
            for(Result result : scanner){
                System.out.println(result);
            }
            scanner.close();
        } catch (Exception e){
            System.out.println(e.getMessage());
            e.printStackTrace();
        } finally {
            if(table != null)
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            System.out.println("operation is over");
        }
    }
}

控制台输出如下

keyvalues={row1/cf1:qual/1492944856216/Put/vlen=4/seqid=0, row1/cf2:qual/1492944856216/Put/vlen=4/seqid=0}
----
keyvalues={row1/cf1:qual/1492944856216/Put/vlen=4/seqid=0}
keyvalues={row1/cf2:qual/1492944856216/Put/vlen=4/seqid=0}
keyvalues=NONE
----
keyvalues={row4/cf1:qual/1492944856238/Put/vlen=4/seqid=0}
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException
----
keyvalues={row3/cf1:qual/1492944856236/Put/vlen=4/seqid=0}
keyvalues={row8/cf1:qual/1492917063146/Put/vlen=4/seqid=0}
is null
----
keyvalues={row1/cf1:qual/1492944856216/Put/vlen=4/seqid=0}
keyvalues={row2/cf1:qual/1492944856236/Put/vlen=4/seqid=0}
keyvalues={row3/cf1:qual/1492944856236/Put/vlen=4/seqid=0}
keyvalues={row4/cf1:qual/1492944856238/Put/vlen=4/seqid=0}
keyvalues={row5/cf1:qual/1492944856238/Put/vlen=4/seqid=0}
keyvalues={row6/cf1:qual/1492944856239/Put/vlen=4/seqid=0}
keyvalues={row7/cf1:qual/1492944856239/Put/vlen=4/seqid=0}
operation is over

4，删除数据

package com.sean;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;

public class Del
{
    public static void main( String[] args )
    {
        // 配置，单机环境只需指定Zookeeper地址
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.137.128");

        HTable table = null;
        try {
            TableName tableName = TableName.valueOf("testtable");
            table = new HTable(conf,tableName);

            // 单行删除
            Delete delete1 = new Delete(Bytes.toBytes("row1"));
            delete1.addFamily(Bytes.toBytes("cf2"));
            table.delete(delete1);

            // 批量删除
            List<Delete> delList = new LinkedList<Delete>();

            Delete delete2 = new Delete(Bytes.toBytes("row2"));
            delList.add(delete2);

            Delete delete3 = new Delete(Bytes.toBytes("row3"));
            delList.add(delete3);

            table.delete(delList);

            // 使用batch接口批量数据，
            delList = new LinkedList<Delete>();

            Delete delete4 = new Delete(Bytes.toBytes("row4"));
            delList.add(delete4);

            Delete delete5 = new Delete(Bytes.toBytes("row99"));
            delList.add(delete5);

            Object[] results = new Object[delList.size()];
            table.batch(delList,results);
            for(Object obj : results)
                System.out.println(obj);

            // 检查后删除
            Delete delete6 = new Delete(Bytes.toBytes("row6"));
            table.checkAndDelete(Bytes.toBytes("row6"),Bytes.toBytes("cf1")
                ,Bytes.toBytes("qual"),Bytes.toBytes("val6"),delete6);
        } catch (Exception e){
            System.out.println(e.getMessage());
            e.printStackTrace();
        } finally {
            if(table != null)
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            System.out.println("operation is over");
        }
    }
}

执行结果如下

hbase(main):055:0> scan 'testtable'
ROW                   COLUMN+CELL                                               
 row1                 column=cf1:qual, timestamp=1492946751365, value=val1      
 row5                 column=cf1:qual, timestamp=1492946751378, value=val5      
 row7                 column=cf1:qual, timestamp=1492946751380, value=val7      
 row8                 column=cf1:qual, timestamp=1492917063146, value=val8      
4 row(s) in 0.0230 seconds