前置条件
hadoop hive hbase环境搭建完成。
搭建版本
hadoop-1.0.3
hive-0.9.0
hbase-0.94.2
zookeeper-3.3.5
步骤
配置hive xml,配置hbase与hive结合包
- <property>
- <name>hive.aux.jars.path</name>
- <value>file:///usr/local/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive-0.9.0/lib/hbase-0.94.2.jar,file:///usr/local/hive-0.9.0/lib/zookeeper-3.3.5.jar</value>
- </property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive-0.9.0/lib/hbase-0.94.2.jar,file:///usr/local/hive-0.9.0/lib/zookeeper-3.3.5.jar</value>
</property>
将hbase-0.94.2.jar 、zookeeper-3.3.5.jar包copy至所有节点$HDOOP_HOME/lib及$HIVE_HOME/lib下。注意$HIVE_HOME/lib含有hbase包需要删除。
将hbase-site.xml文件copy至所有节点$HDOOP_HOME/conf下。
测试,启动hive,创建表
- CREATE TABLE qq(key string, value string)
- STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
- WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:value")
- TBLPROPERTIES ("hbase.table.name" = "xx");
- --数据表
- create table test(a string,b string);
- --含有数据:
- --1 a
- --2 b
- --3 c
CREATETABLE qq(key string, value string)
STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITHSERDEPROPERTIES ("hbase.columns.mapping" =":key,info:value")
TBLPROPERTIES("hbase.table.name" = "xx");
--数据表
createtable test(a string,b string);
--含有数据:
--1 a
--2 b
--3 c
若创建表不成功,查看hive日志提示如下异常:
- 2012-11-29 12:33:02,191 FATAL ExecMapper: java.lang.NoClassDefFoundError: com/google/protobuf/Message
- at org.apache.hadoop.hbase.io.HbaseObjectWritable.<clinit>(HbaseObjectWritable.java:263)
- at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
- at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:638)
- at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1001)
- at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
- at $Proxy8.getProtocolVersion(Unknown Source)
- ......
- 2012-11-29 12:33:02,282 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/io/HbaseObjectWritable
- at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
- at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:638)
- at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1001)
- at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
- at $Proxy8.getProtocolVersion(Unknown Source)
2012-11-2912:33:02,191 FATAL ExecMapper: java.lang.NoClassDefFoundError:com/google/protobuf/Message
at org.apache.hadoop.hbase.io.HbaseObjectWritable.<clinit>(HbaseObjectWritable.java:263)
atorg.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
atorg.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:638)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1001)
atorg.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy8.getProtocolVersion(UnknownSource)
......
2012-11-2912:33:02,282 FATAL org.apache.hadoop.mapred.Child: Error running child :java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/io/HbaseObjectWritable
atorg.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:638)
atorg.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1001)
atorg.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy8.getProtocolVersion(UnknownSource)
表明hbase中protobuf-java-2.4.0a.jar包没有在集群的classpath中,查看该job的syslog logs "conf classpath=..."确实不存在该包,需要将protobuf-java-2.4.0a.jar包copy至所有节点$HADOOP_HOME/lib下,并需要重启集群(不重启该包不会被加载进classpath)
qq表创建表成功后,打开hbase shell窗口,list查看xx表已经存在。
hive表中插入数据:
- insert into table qq select a as key,b as value from test;
insertinto table qq select a as key,b as value from test;
分别查看hive中qq表,hbase中xx表数据,都有数据。----表示结合成功。
很多资料包括官网使用insert into table qq select * from test即可导入,但实践证明不成功,可能是hbase及hive版本导致的。
注意:如上sql中test表字段类型必须与qq表字段类型一致,若不一致会报如下异常:
- Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"a":"a","b":"1"}
- at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
- at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
- ... 7 more
- Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.SerDeException: HBase row key cannot be NULL
- at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:604)
- at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
- at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
- at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
- at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row {"a":"a","b":"1"}
atorg.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
atorg.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 7 more
Caused by:org.apache.hadoop.hive.ql.metadata.HiveException:org.apache.hadoop.hive.serde2.SerDeException: HBase row key cannot be NULL
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:604)
atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
atorg.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
hive创建hbase已经存在的表:
- CREATE EXTERNAL TABLE te(key string, value1 string,value2 string)
- STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
- WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name,info:age")
- TBLPROPERTIES("hbase.table.name" = "test");
CREATEEXTERNAL TABLE te(key string, value1 string,value2 string)
STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITHSERDEPROPERTIES ("hbase.columns.mapping" ="info:name,info:age")
TBLPROPERTIES("hbase.table.name"= "test");
info表示列族,name和age为列族下的两个列。若hbase中还有info:class列,但创建hive关联表使用如上语句,执行select * from te只可得到key,value1、value2的值,即rowkey、name、age,没有class的值。