(1) 创建hbase识别的表
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
hbase.table.name 定义在hbase的table名称,多列时,data:1,data:2;多列族时,data1:1,data2:1;
hbase.columns.mapping 定义在hbase的列族,里面的:key 是固定值而且要保证在表pokes中的foo字段是唯一值
创建有分区的表
CREATE TABLE hbase_table_1(key int, value string) partitioned by (day string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
不支持表的修改
会提示不能修改非本地表。
hive> ALTER TABLE hbase_table_1 ADD PARTITION (day = '2012-09-22');
FAILED: Error in metadata: Cannot use ALTER TABLE on a non-native table FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
(2) 使用sql导入数据
新建hive的数据表
create table pokes(foo int,bar string)row format delimited fields terminated by ',';
批量导入数据
load data local inpath '/home/1.txt' overwrite into table pokes;
1.txt文件的内容为
1,hello
2,pear
3,world
使用sql导入hbase_table_1
<span style="color: rgb(255, 0, 0);">SET hive.hbase.bulk=<span class="code-keyword" style="background-color: inherit;">true</span>;</span>
insert overwrite table hbase_table_1 select * from pokes;
导入有分区的表
insert overwrite table hbase_table_1 partition (day='2012-01-01') select * from pokes;
(3) 查看数据
hive> select * from hbase_table_1;
OK
1 hello
2 pear
3 world
(注:与hbase整合的有分区的表存在个问题 select * from table查询不到数据,select key,value from table可以查到数据)
(4)登录Hbase去查看数据
hbase shell
hbase(main):002:0> describe 'xyz'
DESCRIPTION ENABLED {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE', REPLICATION_S true
COPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSI
ZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0830 seconds
hbase(main):003:0> scan 'xyz'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1331002501432, value=hello
2 column=cf1:val, timestamp=1331002501432, value=pear
3 column=cf1:val, timestamp=1331002501432, value=world
这时在Hbase中可以看到刚才在hive中插入的数据了。
7 对于在hbase已经存在的表,在hive中使用CREATE EXTERNAL TABLE来建立
例如hbase中的表名称为test1,字段为 a: , b: ,c: 在hive中建表语句为
create external table hive_test (key int,gid map<string,string>,sid map<string,string>,uid map<string,string>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="a:,b:,c:") TBLPROPERTIES ("hbase.table.name" = "test1");
在hive中建立好表后,查询hbase中test1表内容
Select * from hive_test;
OK
1 {"":"qqq"} {"":"aaa"} {"":"bbb"}
2 {"":"qqq"} {} {"":"bbb"}
查询gid字段中value值的方法为
select gid[''] from hbase2;
得到查询结果
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201203052222_0017, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203052222_0017
Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201203052222_0017
2012-03-06 14:38:29,141 Stage-1 map = 0%, reduce = 0%
2012-03-06 14:38:33,171 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201203052222_0017
OK
qqq
qqq
如果hbase表test1中的字段为user:gid,user:sid,info:uid,info:level,在hive中建表语句为
create external table hive_test(key int,user map<string,string>,info map<string,string>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="user:,info:") TBLPROPERTIES ("hbase.table.name" = "test1");
查询hbase表的方法为
select user['gid'] from hbase2;
注:hive连接hbase优化,将HADOOP_HOME/conf中的hbase-site.xml文件中增加配置
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
或者在执行hive语句之前执行hive>set hbase.client.scanner.caching=10000;