Hadoop HBase操作数据学习

最新推荐文章于 2022-11-22 21:23:06 发布

艾伦蓝

最新推荐文章于 2022-11-22 21:23:06 发布

阅读量258

点赞数

本文链接：https://blog.csdn.net/lan12334321234/article/details/70049915

版权

HBase是Hadoop的一个子项目,HBase采用了Google BigTable的稀疏的,面向列的数据库实现方式的理论,建立在hadoop的hdfs上,一方面里用了hdfs的高可靠性和可伸缩性,另外一方面里用了BigTable的高效数据组织形式。
可以说HBase为海量数据的real-time相应提供了很好的一个开源解决方案。
据说在某运营商中使用类似于 BigTable(个人猜测应该就是HBase)的技术可以在两秒时间内从2TB数据中查找到某条话费记录.而这是原来该运营商使用Oracle数据库所无法解决的问题。

HBase提供了一个类似于mysql等关系型数据库的shell。 通过该shell我们可以对HBase的内的相关表以及列族进行控制和处理。HBase shell的help命令比较详细的列出了HBase所支持的命令。

注：HBase会为每一行数据添加一个别名，很关键

这里我们用一个学生成绩表作为例子,对HBase的基本操作和基本概念进行讲解：

下面是学生的成绩表：

name  grad course:math course:art 

Tom   1     87           97 

Jerry 2     100          80

这里grad对于表来说是一个列, course对于表来说是一个列族,这个列族由两个列组成:math和art,当然我们可以根据我们的需要在course中建立更多的列族,如computer,physics等相应的列添加入course列族.

有了上面的想法和需求,我们就可以在HBase中建立相应的数据表啦!

1, 建立一个表格 scores 具有两个列族grad 和courese

hbase(main):002:0> create 'scores', 'grade', 'course' 
0 row(s) in 4.1610 seconds

2,查看当先HBase中具有哪些表

hbase(main):003:0> list 
scores 
1 row(s) in 0.0210 seconds

3,查看表的构造

hbase(main):004:0> describe 'scores' 
{NAME => 'scores', IS_ROOT => 'false', IS_META => 'false', FAMILIES => [{NAME => 'course', BLOOMFILTER => 'false', IN_MEMORY => 'false', LENGTH => '2147483647', BLOCKCACHE => 'false', VERSIONS => '3', TTL => '-1', COMPRESSION => 'NONE'}, {NAME => 'grade', BLOOMFILTER => 'false', IN_MEMORY => 'false', LENGTH => '2147483647', BLOCKCACHE => 'false', VERSIONS => '3', TTL => '-1', COMPRESSION => 'NONE'}]} 
1 row(s) in 0.0130 seconds

4, 加入一行数据,行名称为 Tom 列族grad的列名为”” 值位1

hbase(main):005:0> put 'scores', 'Tom', 'grade:', '1' 
0 row(s) in 0.0070 seconds

5,给Tom这一行的数据的列族添加一列 <math,87>

hbase(main):006:0> put 'scores', 'Tom', 'course:math', '87' 
0 row(s) in 0.0040 seconds

6,给Tom这一行的数据的列族添加一列 <art,97>

hbase(main):007:0> put 'scores', 'Tom', 'course:art', '97' 
0 row(s) in 0.0030 seconds

7, 加入一行数据,行名称为 Jerry 列族grad的列名为”” 值位2

hbase(main):008:0> put 'scores', 'Jerry', 'grade:', '2' 
0 row(s) in 0.0040 seconds

8,给Jerry这一行的数据的列族添加一列 <math,100>

hbase(main):009:0> put 'scores', 'Jerry', 'course:math', '100' 
0 row(s) in 0.0030 seconds

9,给Jerry这一行的数据的列族添加一列 <art,80>

hbase(main):010:0> put 'scores', 'Jerry', 'course:art', '80' 
0 row(s) in 0.0050 seconds

10,查看scores表中Tom的相关数据

hbase(main):011:0> get 'scores', 'Tom' 
COLUMN                      CELL 
course:art                  timestamp=1224726394286, value=97 
course:math                 timestamp=1224726377027, value=87 
grade:                      timestamp=1224726360727, value=1 
3 row(s) in 0.0070 seconds

11,查看scores表中所有数据

hbase(main):012:0> scan 'scores' 
ROW                         COLUMN+CELL 
Tom                         column=course:art, timestamp=1224726394286, value=97 
Tom                         column=course:math, timestamp=1224726377027, value=87 
Tom                         column=grade:, timestamp=1224726360727, value=1 
Jerry                       column=course:art, timestamp=1224726424967, value=80 
Jerry                       column=course:math, timestamp=1224726416145, value=100 
Jerry                        column=grade:, timestamp=1224726404965, value=2 
6 row(s) in 0.0410 seconds

12,查看scores表中所有数据courses列族的所有数据

hbase(main):013:0> scan 'scores', ['course:'] 
ROW                         COLUMN+CELL 
Tom                         column=course:art, timestamp=1224726394286, value=97 
Tom                         column=course:math, timestamp=1224726377027, value=87 
Jerry                       column=course:art, timestamp=1224726424967, value=80 
Jerry                       column=course:math, timestamp=1224726416145, value=100 
4 row(s) in 0.0200 seconds

上面就是HBase的基本shell操作的一个例子,可以看出,hbase的shell还是比较简单易用的,从中也 可以看出HBase shell缺少很多传统sql中的一些类似于like等相关操作,当然,HBase作为BigTable的一个开源实现,而BigTable是作为google业务的支持模型,很多sql语句中的一些东西可能还真的不需要。

当然,通过程序我们也可以对HBase进行相关的操作。下面的程序就完成了上面shell操作的内容：


import java.io.IOException; 
import java.io.ByteArrayOutputStream; 
import java.io.DataOutputStream; 
import java.io.ByteArrayInputStream; 
import java.io.DataInputStream; 
import java.util.Map; 
import org.apache.hadoop.io.Writable; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.hbase.HBaseConfiguration; 
import org.apache.hadoop.hbase.HTableDescriptor; 
import org.apache.hadoop.hbase.HColumnDescriptor; 
import org.apache.hadoop.hbase.client.HBaseAdmin; 
import org.apache.hadoop.hbase.client.HTable; 
import org.apache.hadoop.hbase.io.BatchUpdate; 
import org.apache.hadoop.hbase.io.RowResult; 
import org.apache.hadoop.hbase.io.Cell; 
import org.apache.hadoop.hbase.util.Writables; 

public class HBaseBasic { 

    public static void main(String[] args) throws Exception { 
        HBaseConfiguration config = new HBaseConfiguration(); 
        HBaseAdmin admin = new HBaseAdmin(config); 

        if (admin.tableExists("scores")) { 
            System.out.println("drop table"); 
            admin.disableTable("scores"); 
            admin.deleteTable("scores"); 
        } 

        System.out.println("create table"); 
        HTableDescriptor tableDescripter = newHTableDescriptor("scores".getBytes()); 
        tableDescripter.addFamily(newHColumnDescriptor("grade:")); 
        tableDescripter.addFamily(newHColumnDescriptor("course:")); 
        admin.createTable(tableDescripter); 

        HTable table = new HTable(config, "scores"); 

        System.out.println("add Tom's data"); 
        BatchUpdate tomUpdate = new BatchUpdate("Tom"); 
        tomUpdate.put("grade:", Writables.getBytes(newIntWritable(1))); 
        tomUpdate.put("course:math", Writables.getBytes(newIntWritable(87))); 
        tomUpdate.put("course:art", Writables.getBytes(newIntWritable(97))); 
        table.commit(tomUpdate); 

        System.out.println("add Jerry's data"); 
        BatchUpdate jerryUpdate = new BatchUpdate("Jerry"); 
        jerryUpdate.put("grade:", Writables.getBytes(newIntWritable(2))); 
        jerryUpdate.put("course:math", Writables.getBytes(newIntWritable(100))); 
        jerryUpdate.put("course:art", Writables.getBytes(newIntWritable(80))); 
        table.commit(jerryUpdate); 

        for (RowResult row : table.getScanner(new String[] {"course:" })) { 
            System.out.format("ROW\t%s\n", newString(row.getRow())); 
            for (Map.Entry<byte[], Cell> entry : row.entrySet()) { 
                String column = new String(entry.getKey()); 
                Cell cell = entry.getValue(); 
                IntWritable value = new IntWritable(); 
                Writables.copyWritable(cell.getValue(), value); 
                System.out.format("  COLUMN\t%s\t%d\n", column, value.get()); 
            } 
        } 
    } 
}

HBaseAdmin：管理HBase，create and drop tables, list and alter tables
HTable：表访问
Put：新增，创建Put实例，调用HTable.put(Put)来插入数据。
Delete：删除，调用HTable.delete(Delete)
Get：查询一行（Row）数据，调用HTable.get(Get)，返回Result对象，Result是一个KeyValue List，List<KeyValue>
Scan：查询多行数据，调用HTable.getScanner(Scan) ，类似cursor访问，返回 ResultScanner，调用next方法，返回行数据Result
Put，Get，Delete会锁住数据行Row


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseMainClient {
	public static void main(String[] args) throws Exception {
		Configuration config = HBaseConfiguration.create();
		config.set("hbase.zookeeper.quorum", "192.168.1.103");
		config.set("hbase.zookeeper.property.clientPort", "2181");
		HBaseAdmin admin = new HBaseAdmin(config);
		String tableName = "TestTable";
		if (admin.tableExists(tableName)) {
			System.out.println("table   Exists:" + tableName);
		} else {
			HTableDescriptor tableDesc = new HTableDescriptor(tableName);
			tableDesc.addFamily(new HColumnDescriptor("TestFamily"));
			admin.createTable(tableDesc);
			System.out.println("create table ok .");
		}
		HTable table = new HTable(config, "TestTable");
		Put p = new Put(Bytes.toBytes("TestRow"));
		p.add(
			Bytes.toBytes("TestFamily"),
			Bytes.toBytes("someQualifier"),
			Bytes.toBytes("Some Value"));
		table.put(p);
		Get g = new Get(Bytes.toBytes("TestRow"));
		Result r = table.get(g);
		byte[] value = r
			.getValue(Bytes.toBytes("TestFamily"), Bytes.toBytes("someQualifier"));
		String valueStr = Bytes.toString(value);
		System.out.println("GET: " + valueStr);
		Scan s = new Scan();
		s.addColumn(Bytes.toBytes("TestFamily"), Bytes.toBytes("someQualifier"));
		ResultScanner scanner = table.getScanner(s);
		try {
			for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
				System.out.println("Found row: " + rr);
			}
		} finally {
			admin.disableTable(tableName);
			admin.deleteTable(tableName);
			scanner.close();
		}
	}
}

转自： http://szjian.iteye.com/blog/1221141