HBase入门实战以及官网解读

最新推荐文章于 2022-03-08 18:47:01 发布

Try Everything、

最新推荐文章于 2022-03-08 18:47:01 发布

阅读量672

点赞数

分类专栏： Hbase

本文链接：https://blog.csdn.net/weixin_43212365/article/details/105588899

版权

Hbase 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

http://hbase.apache.org/

HBase认识

定位：

Hadoop database
a distributed, scalable, big data store 分布式的，可扩展的数据库存储
random realtime read/write 想要做随机读写或者实时读写，更适合用Hbase
very large tables billions of rows X millions of columns 目标存大的宽表，数十亿行百万列，当然生产中就用到几百个列

可以看出是数据库，不过HBase没有SQL，HBase Shell基本不用，使用很不便利，用phoenix，另外底层存储是落在HDFS

这时候就可以考量了，Hadoop里面有namenode,有datanode,那么Hbase部署的时候是否和namenode部署在同一个节点，我datanode有多少个，我Hbase就部署多少个节点，因为Hbase最终落盘是到HDFS

特性

Linear and modular scalability. 线型可扩展的
Strictly consistent reads and writes. 强一致性读写
Automatic and configurable sharding of tables Automatic failover support between RegionServers.自动配置分配表、包括支持failover
Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
Easy to use Java API for client access. 生产上一般不用Java API，因为我们用phoenix来处理，那么就转换为Spark,Spark SQL就可以
Block cache and Bloom Filters for real-time queries.
Query predicate push down via server side Filters Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options Extensible jruby-based (JIRB) shell Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX push down就像之前的Hive，做过滤时位次下推

对比关系型数据库

1.动态列
2.存储数据形式（非结构化的）
3.事务（大数据并不过多考虑），是一个单表操作，Saprk加载进来宽表也可以做聚合关联
4.分布式：可扩展的
5.支持的列多
6.kv存储
7.表之间的耦合性低
8.存在数据冗余（HBase存在多个版本，要保存三个版本的话就只要把版本设置成三）
9.大数据量查询性能比较高（需要什么列就会加载什么列）
10.存储性能也比关系型数据库性能高，null不存的，HBase动态列
11.本身二级索引不支持(检索方式：rowkey、scan: 全表、范围)，后续可以自己去做这个用phoenix
12.trigger不支持
13.字段的支持格式单一(bytes数组)

HBase Shell

help
create
scan/get(eg: hbase> get 'ns1:t1', 'r1')......

put 'ns1:t123123'  'r1','f1:c1','123'
//表名（命名空间：表名）  行名  列名(列簇：列名)  value

describe  'ns1:t123123'
//有enabled  也有disenabled,enabled的话就不能进行drop操作

TTL时效，也可以做临时的，生产压缩采用snappy

需要先disable  'ns1:t123123'   才能drop  'ns1:t123123'

scan 'ns1:t123123'  扫描表

//需要看表的话需要是enabled状态

get 't1', 'r1'  获取某张表的某行值

HBase Conf

1.安装后存在的文件：

regionservers
hbase-site.xml…

2.配置在安装后文件中的conf

静态配置/动态配置（静态配置修改后需要重启才能生效）

**** 此处为生产中做过调整的参数 ****
	hbase.client.operation.timeout
	phoenix.query.timeoutMs
	phoenix.query.keepAliveMs
	hbase.client.ipc.pool.size
	dfs.socket.timeout
	dfs.client.socket-timeout
	hbase.client.operation.timeout
	hbase.master.namespace.init.timeout
	hbase.regionserver.executor.openregion.threads
	index.builder.threads.max..

	zk retry timeout ulimit flush grant compact split...

WAL files预写日志

update_config , update_all_config
动态配置

HBase 基本术语

HBase是通过Row Key去检索的，相当于mysql里的主键

Time Stamp有多个说明有多个版本

contents:html = “…?” 这里就是列簇：列名

Table => N * Row(Rk) N是由Rk决定

Row => rowkey(RK) [一条数据的唯一标识]

column family(cf): 一行数据同属于一个cf

table => N * cf , cf => n * col,

一个表可以有N多个列簇，一个列簇又可以包含N多个列（每一行列不一定一致，动态列）多行数据可以有不同的col

namespace可以理解为数据库的DB库

如何定位：cell: rk, cf:col, version

ttl：time to live 数据存在的时间长短，默认是forever

eg：row => rw1 column=cf1:q1, timestamp=1550351947916, value=val1

column family列簇，生产上不要超过三个，可以做业务区分

  cf => 基本信息
			{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', 
			 IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', 
			 DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
			 COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', 
			 BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

	master 
	regionserver
		region
		wal
		memstore
		lrucache
		compact
		split
		flush

HBase 搭建 + Demo(Java API / REST API)

版本(注意)冲突： hadoop-2.6.0-cdh5.12.0 || hbase-1.2.0-cdh5.12.0 兼容（apache版本的是否冲突要注意）

http://hbase.apache.org/book.html#_rest REST API

API：
		RowName: rw1 
		Column Family: cf1 
		Column Name: q1 
		Cell Value: val1 
		Timestamp: 1550351947916

HBaseApiDemo

package com.jh;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseApiDemo {

    public static Configuration configuration;
    public static Connection connection;
    public static Admin admin;

    public static void main(String[] args) throws IOException {
        createTable("t4", new String[]{"cf1", "cf2"});
        // insterRow("t4", "rw1", "cf1", "q1", "val1");
        getData("t4", "rw1", "cf1", "q1");
        // scanData("t3", "", "");
        // deleRow("t3", "rw1", "cf1", "q1");
        // deleteTable("t3");
    }

    // 初始化链接
    public static void init() {
        configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop00");
        configuration.set("hbase.zookeeper.property.clientPort", "2181");
        configuration.set("hbase.rootdir", "hdfs://hadoop00:8020/hbase");

        try {
            connection = ConnectionFactory.createConnection(configuration);
            admin = connection.getAdmin();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    // 建表
    public static void createTable(String tableNmae, String[] cols) throws IOException {
        init();

        TableName tableName = TableName.valueOf(tableNmae);

        if (admin.tableExists(tableName)) {
            System.out.println("talbe is exists!");
        } else {
            HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
            for (String col : cols) {
                HColumnDescriptor hColumnDescriptor = new HColumnDescriptor(col);

                hTableDescriptor.addFamily(hColumnDescriptor);
            }
            admin.createTable(hTableDescriptor);
        }
        close();
    }

    // 删表
    public static void deleteTable(String tableName) throws IOException {
        init();
        TableName tn = TableName.valueOf(tableName);
        if (admin.tableExists(tn)) {
            admin.disableTable(tn);
            admin.deleteTable(tn);
        }
        close();
    }

    // 查看已有表
    public static void listTables() throws IOException {
        init();
        HTableDescriptor hTableDescriptors[] = admin.listTables();
        for (HTableDescriptor hTableDescriptor : hTableDescriptors) {
            System.out.println(hTableDescriptor.getNameAsString());
        }
        close();
    }

    // 插入数据
    public static void insterRow(String tableName, String rowkey, String colFamily, String col, String val) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Put put = new Put(Bytes.toBytes(rowkey));
        put.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col), Bytes.toBytes(val));
        table.put(put);

        // 批量插入
        /*
        List<Put> putList = new ArrayList<Put>();
        puts.add(put);
        table.put(putList);*/
        table.close();
        close();
    }

    // 删除数据
    public static void deleRow(String tableName, String rowkey, String colFamily, String col) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Delete delete = new Delete(Bytes.toBytes(rowkey));
        // 删除指定列族
        // delete.addFamily(Bytes.toBytes(colFamily));
        // 删除指定列
        // delete.addColumn(Bytes.toBytes(colFamily),Bytes.toBytes(col));
        table.delete(delete);
        // 批量删除
        /* List<Delete> deleteList = new ArrayList<Delete>();
        deleteList.add(delete);
        table.delete(deleteList);*/
        table.close();
        close();
    }

    // 根据rowkey查找数据
    public static void getData(String tableName, String rowkey, String colFamily, String col) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Get get = new Get(Bytes.toBytes(rowkey));
        //获取指定列族数据
        // get.addFamily(Bytes.toBytes(colFamily));
        // 获取指定列数据
        // get.addColumn(Bytes.toBytes(colFamily),Bytes.toBytes(col));
        Result result = table.get(get);

        showCell(result);
        table.close();
        close();
    }

    // 格式化输出
    public static void showCell(Result result) {
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.println("RowName: " + new String(CellUtil.cloneRow(cell)) + " ");
            System.out.println("Column Family: " + new String(CellUtil.cloneFamily(cell)) + " ");
            System.out.println("Column Name: " + new String(CellUtil.cloneQualifier(cell)) + " ");
            System.out.println("Cell Value: " + new String(CellUtil.cloneValue(cell)) + " ");
            System.out.println("Timestamp: " + cell.getTimestamp() + " ");
        }
    }

    // 批量查找数据
    public static void scanData(String tableName, String startRow, String stopRow) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Scan scan = new Scan();
        // scan.setStartRow(Bytes.toBytes(startRow));
        // scan.setStopRow(Bytes.toBytes(stopRow));
        ResultScanner resultScanner = table.getScanner(scan);
        for (Result result : resultScanner) {
            showCell(result);
        }

        table.close();
        close();
    }

    // 关闭连接
    public static void close() {
        try {
            if (null != admin) {
                admin.close();
            }
            if (null != connection) {
                connection.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

HBaseRestApiDemo

package com.jh;

import com.cloudera.org.apache.http.HttpEntity;
import com.cloudera.org.apache.http.HttpResponse;
import com.cloudera.org.apache.http.client.methods.HttpGet;
import com.cloudera.org.apache.http.impl.client.DefaultHttpClient;
import com.cloudera.org.apache.http.util.EntityUtils;

public class HBaseRestApiDemo {
    public static void main(String[] args) {
        // text/xml || application/json
        String headerInfo = "application/json";
        String result = "";
        result = getTableList(headerInfo);
        // result = getSchemaInfo("t2", headerInfo);

        System.out.println("=== " + result);
    }

    // 显示所有表
    public static String getTableList(String acceptInfo) {
        String uriAPI = "http://hadoop00:9999/";
        String result = "";
        HttpGet requst = new HttpGet(uriAPI);
        try {
            requst.setHeader("Content-Type", "text/xml");
            requst.setHeader("accept", acceptInfo);
            HttpResponse response = new DefaultHttpClient().execute(requst); // 执行请求

            // 其中HttpGet是HttpUriRequst的子类
            int statusCode = response.getStatusLine().getStatusCode();
            if (statusCode == 200 || statusCode == 403) {
                HttpEntity httpEntity = response.getEntity();
                result = EntityUtils.toString(httpEntity);// 取出应答字符串
                // 一般来说都要删除多余的字符
                // 去掉返回结果中的"\r"字符，否则会在结果字符串后面显示一个小方格
                // result.replaceAll("\r", "");
            } else {
                requst.abort();
                result = "异常的返回码:" + statusCode;
            }
        } catch (Exception e) {
            e.printStackTrace();
            result = e.getMessage().toString();
        }
        return result;
    }

    // 获取表信息
    public static String getSchemaInfo(String tableName, String acceptInfo) {
        String uriAPI = "http://hadoop00:9999/" + tableName + "/schema";
        String result = "";
        HttpGet requst = new HttpGet(uriAPI);
        try {
            requst.setHeader("Content-Type", "text/xml");
            requst.setHeader("accept", acceptInfo);
            HttpResponse response = new DefaultHttpClient().execute(requst);

            // 其中HttpGet是HttpUriRequst的子类
            int statusCode = response.getStatusLine().getStatusCode();
            if (statusCode == 200 || statusCode == 403) {
                HttpEntity httpEntity = response.getEntity();
                result = EntityUtils.toString(httpEntity);// 取出应答字符串
            } else {
                requst.abort();
                result = "异常的返回码:" + statusCode;
            }

        } catch (Exception e) {
            e.printStackTrace();
            result = e.getMessage().toString();
        }

        return result;
    }

}

Try Everything、

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
HBase入门实战以及官网解读

http://hbase.apache.org/HBase认识定位：Hadoop databasea distributed, scalable, big data store 分布式的，可扩展的数据库存储random realtime read/write 想要做随机读写或者实时读写，更适合用Hbasevery large tables billions of rows...
复制链接

扫一扫

专栏目录