HBase入门实战以及官网解读

http://hbase.apache.org/

HBase认识

定位:

  • Hadoop database
  • a distributed, scalable, big data store 分布式的,可扩展的数据库存储
  • random realtime read/write 想要做随机读写或者实时读写,更适合用Hbase
  • very large tables billions of rows X millions of columns 目标存大的宽表,数十亿行百万列,当然生产中就用到几百个列

可以看出是数据库,不过HBase没有SQL,HBase Shell基本不用,使用很不便利,用phoenix,另外底层存储是落在HDFS

这时候就可以考量了,Hadoop里面有namenode,有datanode,那么Hbase部署的时候是否和namenode部署在同一个节点,我datanode有多少个,我Hbase就部署多少个节点,因为Hbase最终落盘是到HDFS

特性

  • Linear and modular scalability. 线型可扩展的
  • Strictly consistent reads and writes. 强一致性读写
  • Automatic and configurable sharding of tables Automatic failover support between RegionServers.自动配置分配表、包括支持failover
  • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
  • Easy to use Java API for client access. 生产上一般不用Java API,因为我们用phoenix来处理,那么就转换为Spark,Spark SQL就可以
  • Block cache and Bloom Filters for real-time queries.
  • Query predicate push down via server side Filters Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options Extensible jruby-based (JIRB) shell Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX push down就像之前的Hive,做过滤时位次下推

对比关系型数据库

  • 1.动态列
  • 2.存储数据形式(非结构化的)
  • 3.事务(大数据并不过多考虑),是一个单表操作,Saprk加载进来宽表也可以做聚合关联
  • 4.分布式:可扩展的
  • 5.支持的列多
  • 6.kv存储
  • 7.表之间的耦合性低
  • 8.存在数据冗余(HBase存在多个版本,要保存三个版本的话就只要把版本设置成三)
  • 9.大数据量查询性能比较高(需要什么列就会加载什么列)
  • 10.存储性能也比关系型数据库性能高,null不存的,HBase动态列
  • 11.本身二级索引不支持(检索方式:rowkey、scan: 全表、范围),后续可以自己去做 这个用phoenix
  • 12.trigger不支持
  • 13.字段的支持格式单一(bytes数组)

HBase Shell

help
create
scan/get(eg: hbase> get 'ns1:t1', 'r1')......

put 'ns1:t123123'  'r1','f1:c1','123'
//表名(命名空间:表名)  行名  列名(列簇:列名)  value

describe  'ns1:t123123'
//有enabled  也有disenabled,enabled的话就不能进行drop操作

TTL时效,也可以做临时的,生产压缩采用snappy

需要先disable  'ns1:t123123'   才能drop  'ns1:t123123'

scan 'ns1:t123123'  扫描表

//需要看表的话需要是enabled状态

get 't1', 'r1'  获取某张表的某行值

HBase Conf

1.安装后存在的文件:

  • regionservers
  • hbase-site.xml…

2.配置在安装后文件中的conf

静态配置/动态配置(静态配置修改后需要重启才能生效)

**** 此处为生产中做过调整的参数 ****
	hbase.client.operation.timeout
	phoenix.query.timeoutMs
	phoenix.query.keepAliveMs
	hbase.client.ipc.pool.size
	dfs.socket.timeout
	dfs.client.socket-timeout
	hbase.client.operation.timeout
	hbase.master.namespace.init.timeout
	hbase.regionserver.executor.openregion.threads
	index.builder.threads.max..

	zk retry timeout ulimit flush grant compact split...

WAL files预写日志

update_config , update_all_config
动态配置

HBase 基本术语

HBase是通过Row Key去检索的,相当于mysql里的主键

Time Stamp有多个说明有多个版本

contents:html = “…?” 这里就是列簇:列名

Table => N * Row(Rk) N是由Rk决定

Row => rowkey(RK) [一条数据的唯一标识]

column family(cf): 一行数据同属于一个cf

table => N * cf , cf => n * col,

一个表可以有N多个列簇,一个列簇又可以包含N多个列(每一行列不一定一致,动态列) 多行数据可以有不同的col

namespace可以理解为数据库的DB库

如何定位:cell: rk, cf:col, version

ttl:time to live 数据存在的时间长短,默认是forever

eg:row => rw1 column=cf1:q1, timestamp=1550351947916, value=val1

column family列簇,生产上不要超过三个,可以做业务区分

  cf => 基本信息
			{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', 
			 IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', 
			 DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
			 COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', 
			 BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
	master 
	regionserver
		region
		wal
		memstore
		lrucache
		compact
		split
		flush

HBase 搭建 + Demo(Java API / REST API)

版本(注意)冲突: hadoop-2.6.0-cdh5.12.0 || hbase-1.2.0-cdh5.12.0 兼容 (apache版本的是否冲突要注意)

http://hbase.apache.org/book.html#_rest REST API

API:
		RowName: rw1 
		Column Family: cf1 
		Column Name: q1 
		Cell Value: val1 
		Timestamp: 1550351947916 

HBaseApiDemo

package com.jh;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseApiDemo {

    public static Configuration configuration;
    public static Connection connection;
    public static Admin admin;

    public static void main(String[] args) throws IOException {
        createTable("t4", new String[]{"cf1", "cf2"});
        // insterRow("t4", "rw1", "cf1", "q1", "val1");
        getData("t4", "rw1", "cf1", "q1");
        // scanData("t3", "", "");
        // deleRow("t3", "rw1", "cf1", "q1");
        // deleteTable("t3");
    }

    // 初始化链接
    public static void init() {
        configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop00");
        configuration.set("hbase.zookeeper.property.clientPort", "2181");
        configuration.set("hbase.rootdir", "hdfs://hadoop00:8020/hbase");

        try {
            connection = ConnectionFactory.createConnection(configuration);
            admin = connection.getAdmin();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    // 建表
    public static void createTable(String tableNmae, String[] cols) throws IOException {
        init();

        TableName tableName = TableName.valueOf(tableNmae);

        if (admin.tableExists(tableName)) {
            System.out.println("talbe is exists!");
        } else {
            HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
            for (String col : cols) {
                HColumnDescriptor hColumnDescriptor = new HColumnDescriptor(col);

                hTableDescriptor.addFamily(hColumnDescriptor);
            }
            admin.createTable(hTableDescriptor);
        }
        close();
    }

    // 删表
    public static void deleteTable(String tableName) throws IOException {
        init();
        TableName tn = TableName.valueOf(tableName);
        if (admin.tableExists(tn)) {
            admin.disableTable(tn);
            admin.deleteTable(tn);
        }
        close();
    }

    // 查看已有表
    public static void listTables() throws IOException {
        init();
        HTableDescriptor hTableDescriptors[] = admin.listTables();
        for (HTableDescriptor hTableDescriptor : hTableDescriptors) {
            System.out.println(hTableDescriptor.getNameAsString());
        }
        close();
    }

    // 插入数据
    public static void insterRow(String tableName, String rowkey, String colFamily, String col, String val) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Put put = new Put(Bytes.toBytes(rowkey));
        put.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col), Bytes.toBytes(val));
        table.put(put);

        // 批量插入
        /*
        List<Put> putList = new ArrayList<Put>();
        puts.add(put);
        table.put(putList);*/
        table.close();
        close();
    }

    // 删除数据
    public static void deleRow(String tableName, String rowkey, String colFamily, String col) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Delete delete = new Delete(Bytes.toBytes(rowkey));
        // 删除指定列族
        // delete.addFamily(Bytes.toBytes(colFamily));
        // 删除指定列
        // delete.addColumn(Bytes.toBytes(colFamily),Bytes.toBytes(col));
        table.delete(delete);
        // 批量删除
        /* List<Delete> deleteList = new ArrayList<Delete>();
        deleteList.add(delete);
        table.delete(deleteList);*/
        table.close();
        close();
    }

    // 根据rowkey查找数据
    public static void getData(String tableName, String rowkey, String colFamily, String col) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Get get = new Get(Bytes.toBytes(rowkey));
        //获取指定列族数据
        // get.addFamily(Bytes.toBytes(colFamily));
        // 获取指定列数据
        // get.addColumn(Bytes.toBytes(colFamily),Bytes.toBytes(col));
        Result result = table.get(get);

        showCell(result);
        table.close();
        close();
    }

    // 格式化输出
    public static void showCell(Result result) {
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.println("RowName: " + new String(CellUtil.cloneRow(cell)) + " ");
            System.out.println("Column Family: " + new String(CellUtil.cloneFamily(cell)) + " ");
            System.out.println("Column Name: " + new String(CellUtil.cloneQualifier(cell)) + " ");
            System.out.println("Cell Value: " + new String(CellUtil.cloneValue(cell)) + " ");
            System.out.println("Timestamp: " + cell.getTimestamp() + " ");
        }
    }

    // 批量查找数据
    public static void scanData(String tableName, String startRow, String stopRow) throws IOException {
        init();
        Table table = connection.getTable(TableName.valueOf(tableName));
        Scan scan = new Scan();
        // scan.setStartRow(Bytes.toBytes(startRow));
        // scan.setStopRow(Bytes.toBytes(stopRow));
        ResultScanner resultScanner = table.getScanner(scan);
        for (Result result : resultScanner) {
            showCell(result);
        }

        table.close();
        close();
    }

    // 关闭连接
    public static void close() {
        try {
            if (null != admin) {
                admin.close();
            }
            if (null != connection) {
                connection.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

HBaseRestApiDemo

package com.jh;

import com.cloudera.org.apache.http.HttpEntity;
import com.cloudera.org.apache.http.HttpResponse;
import com.cloudera.org.apache.http.client.methods.HttpGet;
import com.cloudera.org.apache.http.impl.client.DefaultHttpClient;
import com.cloudera.org.apache.http.util.EntityUtils;

public class HBaseRestApiDemo {
    public static void main(String[] args) {
        // text/xml || application/json
        String headerInfo = "application/json";
        String result = "";
        result = getTableList(headerInfo);
        // result = getSchemaInfo("t2", headerInfo);

        System.out.println("=== " + result);
    }

    // 显示所有表
    public static String getTableList(String acceptInfo) {
        String uriAPI = "http://hadoop00:9999/";
        String result = "";
        HttpGet requst = new HttpGet(uriAPI);
        try {
            requst.setHeader("Content-Type", "text/xml");
            requst.setHeader("accept", acceptInfo);
            HttpResponse response = new DefaultHttpClient().execute(requst); // 执行请求

            // 其中HttpGet是HttpUriRequst的子类
            int statusCode = response.getStatusLine().getStatusCode();
            if (statusCode == 200 || statusCode == 403) {
                HttpEntity httpEntity = response.getEntity();
                result = EntityUtils.toString(httpEntity);// 取出应答字符串
                // 一般来说都要删除多余的字符
                // 去掉返回结果中的"\r"字符,否则会在结果字符串后面显示一个小方格
                // result.replaceAll("\r", "");
            } else {
                requst.abort();
                result = "异常的返回码:" + statusCode;
            }
        } catch (Exception e) {
            e.printStackTrace();
            result = e.getMessage().toString();
        }
        return result;
    }

    // 获取表信息
    public static String getSchemaInfo(String tableName, String acceptInfo) {
        String uriAPI = "http://hadoop00:9999/" + tableName + "/schema";
        String result = "";
        HttpGet requst = new HttpGet(uriAPI);
        try {
            requst.setHeader("Content-Type", "text/xml");
            requst.setHeader("accept", acceptInfo);
            HttpResponse response = new DefaultHttpClient().execute(requst);

            // 其中HttpGet是HttpUriRequst的子类
            int statusCode = response.getStatusLine().getStatusCode();
            if (statusCode == 200 || statusCode == 403) {
                HttpEntity httpEntity = response.getEntity();
                result = EntityUtils.toString(httpEntity);// 取出应答字符串
            } else {
                requst.abort();
                result = "异常的返回码:" + statusCode;
            }

        } catch (Exception e) {
            e.printStackTrace();
            result = e.getMessage().toString();
        }

        return result;
    }

}

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值