HBase伪分布式部署及JavaAPI操作

最新推荐文章于 2022-06-26 09:48:33 发布

china_cqcone

最新推荐文章于 2022-06-26 09:48:33 发布

阅读量3.4k

点赞数

分类专栏： hadoop 文章标签： hbase nosql hadoop 分布式生态系统

本文链接：https://blog.csdn.net/cqconelin/article/details/73500713

版权

hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

HBase伪分布式部署及JavaAPI操作

本篇为HBase基本入门，HBase为Hadoop生态系统中的NoSql实现，所以它是最常用的形式是建立在Hadoop的HDFS上，故安装HBase前需先安装Hadoop（本篇基于Linux mint18 环境）：

- Hadoop安装

Hadoop的下载
Hadoop的下载地址为：http://mirror.bit.edu.cn/apache/hadoop/common/，一般下载最新的稳定版，我的下载为：hadoop-2.8.0.tar.gz

配置Hadoop伪分布式部署配置
Hadoop解压缩到一个当前用户拥有权限的路径，我的路径为：/home/linmint/programs/hadoop-2.8.0，接下来我用“Hadoop”来简短代替该解压缩路径。Hadoop的配置文件就位于Hadoop/etc/hadoop目录下，伪分布式部署需要修改2个配置文件：
1、core-site.xml：在 configuration 标签中添加如下属性：

   <configuration>
         <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/home/linmint/tmp/hadoop</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>

其中file:/home/linmint/tmp/hadoop为本地临时存储路径，注意：如果非本地客户端API访问，localhost需写成IP地址的形式，否则客户端API无法解析localhost会报ConnectionException。
2、hdfs-site.xml：在Configuration标签中添加如下属性：

        <configuration>
            <property>
                <name>dfs.replication</name>
                <value>1</value>
            </property>
            <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/home/linmint/tmp/hadoop/dfs/name</value>
            </property>
            <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/home/linmint/tmp/hadoop/dfs/data</value>
            </property>
        </configuration>

其中file:/home/linmint/tmp/hadoop/dfs/name和file:/home/linmint/tmp/hadoop/dfs/data均为本地路径。

执行 NameNode 的格式化
在Hadoop目录下有两个含有大量命令的目录，分别是bin和sbin，bin目录主要用在操作HDFS文件系统上，sbin则包含在Linux环境下执行的命令，比如启动hadoop。
执行NameNode的格式化命令：Hadoop/bin/hdfs namenode -format，执行成功会在控制台输出中找到successfully formatted的字样。如出现如下显示的错误：Error: JAVA_HOME is not set and could not be found.，表示JAVA_HOME路径没有配置正确，请在~/.bashrc中加入export JAVA_HOME=/usr/local/jdk1.7.80导出JAVA_HOME目录，并立即执行source ~/.bashrc使环境变量生效。该命令会将core-site.xml中配置的hadoop.tmp.dir目录格式化成Hadoop中标准的HDFS分布式文件系统形式。
开启 NameNode 和 DataNode 守护进程
执行如下命令开启hadoop：Hadoop/sbin/start-dfs.sh，在开户各个结点时会提示输入管理员密码，会有多次输入，可配置SSH公私匙避免多次输入密码（1、确保已经安装了SSH，debian系列通过如下指令测试是否已经安装SSH，sudo apt-get install ssh；2、基于空口令创建一个新SSH密钥以实现无密码登录，ssh-keygen -t rsa -p ” -f ~/.ssh/id_rsa 回车生成密钥后再键入指令cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys；3、测试，ssh localhost，如果成功则无需键入密码）。启动完成后，可用jps命令查看到如下信息node信息，浏览器输入 http://localhost:50070可WEB查看NameNode和DataNode结点信息。
运行Hadoop伪分布式实例
1、利用Hadoop/bin/hdfs dfs 这种shell命令形式操作HDFS文件系统，先建立用户目录：
Hadoop/bin/hdfs dfs -mkdir /user/hadoop，该命令在HDFS文件系统中建立了一个目录，可以用：
Hadoop/bin/hdfs dfs -ls / 命令查看目录。
2、将Hadoop/etc/hadoop中的xml配置文件全部复制到HDFS文件系统中的/user/hadoop/input目录中，执行如下命令：
Hadoop/bin/hdfs dfs -mkdir /user/hadoop/input
Hadoop/bin/hdfs dfs -put Hadoop/etc/hadoop/*.xml /user/hadoop/input
3、重启Hadoop，先停止原先启动的Hadoop，执行命令：Hadoop/sbin/stop-dfs.sh，停止之后再启动：Hadoop/sbin/start-dfs.sh，再用jps查看是否启动成功。
至此Hadoop伪分布式完成启动。

-HBase安装

安装Hbase的伪分布式模式，首先得保证你的hadoop环境已经安装好，并且可以正常使用，因为hbase底层存储使用的是HDFS，所以安装Hbase前，务必先安装hadoop，并且启动顺序也是先启动Hadoop后启动HBase，同理关闭的顺序就反过来，先关闭HBase后关闭Hadoop。
- HBase的下载
HBase的下载地址为：https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/，我下载的是：hbase-1.1.0-bin.tar.gz

-解压并配置伪分布式
将HBase解压缩到当前用户拥有全部权限的路径，本篇用HBase代替解压缩后路径，我的解压缩路径是：/home/linmint/programs/hbase-1.1.0。接着配置HBase/conf下的两个文件：
1、hbase-env.sh
在# The java implementation to use. Java 1.7+ required.注释下添加两行
export JAVA_HOME=/home/linmint/programs/jdk1.8.0_131(填写自己的jdk路径)
export HBASE_CLASSPATH=/home/linmint/programs/hbase-1.1.0/lib #HBase类路径，填写自己的
在# Tell HBase whether it should manage it’s own instance of Zookeeper or not.注释下添加一行
export HBASE_MANAGES_ZK=true
表示Hbase控制zookeeper的启动和结束，用的Hbase自带的zookeeper。
2、 hbase-site.xml
在configuration标签中添加如下属性：

 <configuration>
     <property>  
            <name>hbase.rootdir</name>  
            <value>hdfs://localhost:9000/hbase</value>  
    </property>  
    <property>  
            <name>hbase.cluster.distributed</name>  
            <value>true</value>  
    </property>  
    <property>  
            <name>hbase.zookeeper.quorum</name>  
            <value>10.0.0.110</value>  
    </property>  
    <property>  
            <name>hbase.zookeeper.property.dataDir</name>  
            <value>/home/linmint/tmp/zookeeper</value>  
    </property>  
    <property>  
            <name>dfs.replication</name>  
            <value>2</value>  
    </property>
     <property>
        <name>hbase.regionserver.dns.nameserver</name>
        <value>10.0.0.110</value>
     </property>
</configuration>

其中，hbase.rootdir表示使用的HDFS文件系统，与Hadoop配置的fs.defaultFS一致。hbase.zookeeper.quorum表示zookeeper地址，默认端口为2181；hbase.regionserver.dns.nameserver一般也为本地地址。
配置完成后，先启动Hadoop，然后启动Hbase，用如下命令：
Hbase/start-hbase.sh，
Hbase/stop-hbase.sh停止Hbase。
用jps查看是否启动成功。
用Hbase/hbase shell 可进入Hbase shell操作客户端，在shell 中输入list 命令可查看表，输入scan ‘表名’可查看表数据。

-Java Client API

新建一个Maven项目，在pom.xml中引入Java Hbase Client API（同时引入spring-hbase，测试spring hbaseTemplate操作hbase）：

    <dependencies>
    <!-- 添加Spring-core包 -->
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-core</artifactId>
        <version>4.1.4.RELEASE</version>
    </dependency>
    <!-- 添加spring-context包 -->
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-context</artifactId>
        <version>4.1.4.RELEASE</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>0.96.2-hadoop2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.springframework.data</groupId>
        <artifactId>spring-data-hadoop</artifactId>
        <version>2.0.2.RELEASE</version>
    </dependency>
    <dependency>
        <groupId>org.apache.zookeeper</groupId>
        <artifactId>zookeeper</artifactId>
        <version>3.4.6</version>
    </dependency>
</dependencies>

拉下来直接粘贴对hbase建表及增删改查的代码，具体用户见代码注释：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
public class HBaseClient {
private static Configuration conf = null;
static {
conf = HBaseConfiguration.create();
conf.set(“hbase.zookeeper.quorum”, “10.0.0.110”);
}

/*
 * 创建表
 * 
 * @tableName 表名
 * 
 * @family 列族列表
 */
public static void creatTable(String tableName, String[] family)
        throws Exception {
    HBaseAdmin admin = new HBaseAdmin(conf);
    HTableDescriptor desc = new HTableDescriptor(tableName);
    for (int i = 0; i < family.length; i++) {
        desc.addFamily(new HColumnDescriptor(family[i]));
    }
    if (admin.tableExists(tableName)) {
        System.out.println("table Exists!");
        System.exit(0);
    } else {
        admin.createTable(desc);
        System.out.println("create table Success!");
    }
}

/*
 * 为表添加数据（适合知道有多少列族的固定表）
 * 
 * @rowKey rowKey
 * 
 * @tableName 表名
 * 
 * @column1 第一个列族列表
 * 
 * @value1 第一个列的值的列表
 * 
 * @column2 第二个列族列表
 * 
 * @value2 第二个列的值的列表
 */
public static void addData(String rowKey, String tableName,
        String[] column1, String[] value1, String[] column2, String[] value2)
        throws IOException {
    Put put = new Put(Bytes.toBytes(rowKey));// 设置rowkey
    HTable table = new HTable(conf, Bytes.toBytes(tableName));// HTabel负责跟记录相关的操作如增删改查等//
                                                                // 获取表
    HColumnDescriptor[] columnFamilies = table.getTableDescriptor() // 获取所有的列族
            .getColumnFamilies();

    for (int i = 0; i < columnFamilies.length; i++) {
        String familyName = columnFamilies[i].getNameAsString(); // 获取列族名
        if (familyName.equals("article")) { // article列族put数据
            for (int j = 0; j < column1.length; j++) {
                put.add(Bytes.toBytes(familyName),
                        Bytes.toBytes(column1[j]), Bytes.toBytes(value1[j]));
            }
        }
        if (familyName.equals("author")) { // author列族put数据
            for (int j = 0; j < column2.length; j++) {
                put.add(Bytes.toBytes(familyName),
                        Bytes.toBytes(column2[j]), Bytes.toBytes(value2[j]));
            }
        }
    }
    table.put(put);
    System.out.println("add data Success!");
}

/*
 * 根据rwokey查询
 * 
 * @rowKey rowKey
 * 
 * @tableName 表名
 */
public static Result getResult(String tableName, String rowKey)
        throws IOException {
    Get get = new Get(Bytes.toBytes(rowKey));
    HTable table = new HTable(conf, Bytes.toBytes(tableName));// 获取表
    Result result = table.get(get);
    for (KeyValue kv : result.list()) {
        System.out.println("family:" + Bytes.toString(kv.getFamily()));
        System.out
                .println("qualifier:" + Bytes.toString(kv.getQualifier()));
        System.out.println("value:" + Bytes.toString(kv.getValue()));
        System.out.println("Timestamp:" + kv.getTimestamp());
        System.out.println("-------------------------------------------");
    }
    return result;
}

/*
 * 遍历查询hbase表
 * 
 * @tableName 表名
 */
public static void getResultScann(String tableName) throws IOException {
    Scan scan = new Scan();
    ResultScanner rs = null;
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    try {
        rs = table.getScanner(scan);
        for (Result r : rs) {
            for (KeyValue kv : r.list()) {
                System.out.println("row:" + Bytes.toString(kv.getRow()));
                System.out.println("family:"
                        + Bytes.toString(kv.getFamily()));
                System.out.println("qualifier:"
                        + Bytes.toString(kv.getQualifier()));
                System.out
                        .println("value:" + Bytes.toString(kv.getValue()));
                System.out.println("timestamp:" + kv.getTimestamp());
                System.out
                        .println("-------------------------------------------");
            }
        }
    } finally {
        rs.close();
    }
}

/*
 * 遍历查询hbase表
 * 
 * @tableName 表名
 */
public static void getResultScann(String tableName, String start_rowkey,
        String stop_rowkey) throws IOException {
    Scan scan = new Scan();
    scan.setStartRow(Bytes.toBytes(start_rowkey));
    scan.setStopRow(Bytes.toBytes(stop_rowkey));
    ResultScanner rs = null;
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    try {
        rs = table.getScanner(scan);
        for (Result r : rs) {
            for (KeyValue kv : r.list()) {
                System.out.println("row:" + Bytes.toString(kv.getRow()));
                System.out.println("family:"
                        + Bytes.toString(kv.getFamily()));
                System.out.println("qualifier:"
                        + Bytes.toString(kv.getQualifier()));
                System.out
                        .println("value:" + Bytes.toString(kv.getValue()));
                System.out.println("timestamp:" + kv.getTimestamp());
                System.out
                        .println("-------------------------------------------");
            }
        }
    } finally {
        rs.close();
    }
}

/*
 * 查询表中的某一列
 * 
 * @tableName 表名
 * 
 * @rowKey rowKey
 */
public static void getResultByColumn(String tableName, String rowKey,
        String familyName, String columnName) throws IOException {
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    Get get = new Get(Bytes.toBytes(rowKey));
    get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName)); // 获取指定列族和列修饰符对应的列
    Result result = table.get(get);
    for (KeyValue kv : result.list()) {
        System.out.println("family:" + Bytes.toString(kv.getFamily()));
        System.out
                .println("qualifier:" + Bytes.toString(kv.getQualifier()));
        System.out.println("value:" + Bytes.toString(kv.getValue()));
        System.out.println("Timestamp:" + kv.getTimestamp());
        System.out.println("-------------------------------------------");
    }
}

/*
 * 更新表中的某一列
 * 
 * @tableName 表名
 * 
 * @rowKey rowKey
 * 
 * @familyName 列族名
 * 
 * @columnName 列名
 * 
 * @value 更新后的值
 */
public static void updateTable(String tableName, String rowKey,
        String familyName, String columnName, String value)
        throws IOException {
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    Put put = new Put(Bytes.toBytes(rowKey));
    put.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName),
            Bytes.toBytes(value));
    table.put(put);
    System.out.println("update table Success!");
}

/*
 * 查询某列数据的多个版本
 * 
 * @tableName 表名
 * 
 * @rowKey rowKey
 * 
 * @familyName 列族名
 * 
 * @columnName 列名
 */
public static void getResultByVersion(String tableName, String rowKey,
        String familyName, String columnName) throws IOException {
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    Get get = new Get(Bytes.toBytes(rowKey));
    get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName));
    get.setMaxVersions(5);
    Result result = table.get(get);
    for (KeyValue kv : result.list()) {
        System.out.println("family:" + Bytes.toString(kv.getFamily()));
        System.out
                .println("qualifier:" + Bytes.toString(kv.getQualifier()));
        System.out.println("value:" + Bytes.toString(kv.getValue()));
        System.out.println("Timestamp:" + kv.getTimestamp());
        System.out.println("-------------------------------------------");
    }
    /*
     * List<?> results = table.get(get).list(); Iterator<?> it =
     * results.iterator(); while (it.hasNext()) {
     * System.out.println(it.next().toString()); }
     */
}

/*
 * 删除指定的列
 * 
 * @tableName 表名
 * 
 * @rowKey rowKey
 * 
 * @familyName 列族名
 * 
 * @columnName 列名
 */
public static void deleteColumn(String tableName, String rowKey,
        String falilyName, String columnName) throws IOException {
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    Delete deleteColumn = new Delete(Bytes.toBytes(rowKey));
    deleteColumn.deleteColumns(Bytes.toBytes(falilyName),
            Bytes.toBytes(columnName));
    table.delete(deleteColumn);
    System.out.println(falilyName + ":" + columnName + "is deleted!");
}

/*
 * 删除指定的列
 * 
 * @tableName 表名
 * 
 * @rowKey rowKey
 */
public static void deleteAllColumn(String tableName, String rowKey)
        throws IOException {
    HTable table = new HTable(conf, Bytes.toBytes(tableName));
    Delete deleteAll = new Delete(Bytes.toBytes(rowKey));
    table.delete(deleteAll);
    System.out.println("all columns are deleted!");
}

/*
 * 删除表
 * 
 * @tableName 表名
 */
public static void deleteTable(String tableName) throws IOException {
    HBaseAdmin admin = new HBaseAdmin(conf);
    admin.disableTable(tableName);
    admin.deleteTable(tableName);
    System.out.println(tableName + "is deleted!");
}

public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub
    // 创建表
    String tableName = "blog2";
    String[] family = { "article", "author" };
    creatTable(tableName, family);

    // 为表添加数据

    String[] column1 = { "title", "content", "tag" };
    String[] value1 = {
            "Head First HBase",
            "HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.",
            "Hadoop,HBase,NoSQL" };
    String[] column2 = { "name", "nickname" };
    String[] value2 = { "nicholas", "lee" };
    addData("rowkey1", "blog2", column1, value1, column2, value2);
    addData("rowkey2", "blog2", column1, value1, column2, value2);
    addData("rowkey3", "blog2", column1, value1, column2, value2);

    // 遍历查询
    getResultScann("blog2", "rowkey4", "rowkey5");
    // 根据row key范围遍历查询
    getResultScann("blog2", "rowkey4", "rowkey5");

    // 查询
    getResult("blog2", "rowkey1");

    // 查询某一列的值
    getResultByColumn("blog2", "rowkey1", "author", "name");

    // 更新列
    updateTable("blog2", "rowkey1", "author", "name", "bin");

    // 查询某一列的值
    getResultByColumn("blog2", "rowkey1", "author", "name");

    // 查询某列的多版本
    getResultByVersion("blog2", "rowkey1", "author", "name");

    // 删除一列
    deleteColumn("blog2", "rowkey1", "author", "nickname");

    // 删除所有列
    deleteAllColumn("blog2", "rowkey1");

    // 删除表
    deleteTable("blog2");
}
}

注意：hbase服务结点注册的zookeeper返回的计算机名而非ip地址，故调用端必须ping通该计算机名。我hbase在virtualbox虚拟机中，调用在宿主win10系统，为了能用计算机名访问linux虚拟机，则修改了win10的host文件，该文件的路径为：C:\Windows\System32\drivers\etc\hosts，在该文件末尾添加一行：10.0.0.110 linmint-VirtualBox，左边为ip，右边为zoo返回的计算机名，保存时可能提示无法保存，右键修改该文件安全属性，给该文件Users组写入权限，保存后即可ping 通计算机名。

接下来粘贴spring hbaseTemplate访问形式，与jdbcTemplate和mongoTemplate类似，先在xml文件中配置该模板：

<!-- 配置hbase连接 -->
<!-- HDFS配置 -->
<hdp:configuration id="hadoopConfiguration">fs.default.name=hdfs://10.0.0.110:9000</hdp:configuration>
<!-- HBase连接配置 -->
<hdp:hbase-configuration id="hbaseConfiguration" zk-quorum="10.0.0.110" zk-port="2181"/>
<!-- HbaseTemplate Bean配置-->
<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
    <property name="configuration" ref="hbaseConfiguration"></property>
</bean>

因编辑器原因，一些spring头无法粘贴过来，故再来截图的形式：
这里写图片描述

接下来粘贴一个hbaseTemplate操作代码，其它的操作方法见文档：

ApplicationContext context = new ClassPathXmlApplicationContext("spring-hbase.xml");
    HbaseTemplate hbTemplate = (HbaseTemplate)context.getBean("hbaseTemplate");
    //插入数据
    hbTemplate.execute(TABLE_NAME, new TableCallback<Boolean>() {
        @Override
        public Boolean doInTable(HTableInterface table) throws Throwable {
            // TODO Auto-generated method stub
            boolean flag = false;  
            try{  
                byte[] rowkey = ROW_KEY.getBytes();  
                Put put = new Put(rowkey);  
                put.add(Bytes.toBytes(COLUMN_FAMILY),Bytes.toBytes(QUALIFIER), Bytes.toBytes("林"));  
                table.put(put);  
             flag = true;  
            }catch(Exception e){  
                e.printStackTrace();  
            }  
            return flag;
        }
    });

该代码需先在hbase中建立相应列族的表才能正常运行。

整个过程我是正常跑起来了，或者我在写的时候有什么遗漏，如遇问题可留言。