楔子
hbase 与 hive 交互
需要操作hive的同时对hbase也产生影响,所以HIve需要持有操作HBase的jar,拷贝或者使用软链接的方式进行。作为自己学习笔记。
1 配置
hive中操作hbase的jar
1.1 软链接jar
export HBASE_HOME=/opt/module/hbase0986
export HIVE_HOME=/opt/module/hive013
ln -s $HBASE_HOME/lib/hbase-common-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-common-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/hbase-server-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-server-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/hbase-client-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-client-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/hbase-it-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-it-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/htrace-core-2.04.jar $HIVE_HOME/lib/htrace-core-2.04.jar
ln -s $HBASE_HOME/lib/hbase-hadoop2-compat-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-hadoop2-compat-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/hbase-hadoop-compat-0.98.6-cdh5.3.6.jar $HIVE_HOME/lib/hbase-hadoop-compat-0.98.6-cdh5.3.6.jar
ln -s $HBASE_HOME/lib/high-scale-lib-1.1.1.jar $HIVE_HOME/lib/high-scale-lib-1.1.1.jar
1.2 hive-site.xml修改zookeeper属性值
<property>
<name>hive.zookeeper.quorum</name>
<value>hadoop</value>
<description>The list of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
</property>
2操作案例
2.1 案例一
建立hive表,关联hbase表,插入数据到hive表的同是能影响hbase
2.1.1创建hive表同时关联hbase
键值一一对应
CREATE TABLE hive_hbase_emp_table(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");
执行上述命令在hive和hbase都分别创建了表
hive表中查看
hive> show tables;
OK
hive_hbase_emp_table
Time taken: 0.114 seconds, Fetched: 1 row(s)
hive>
hbase中表查看
hbase(main):002:0> list
TABLE
SLF4J: Class path contains multiple SLF4J bindings……
fruit
fruits
fruits_hdfs
fruits_mr
hbase_emp_table
person
stu
7 row(s) in 21.8300 seconds
=> ["fruit", "fruits", "fruits_hdfs", "fruits_mr", "hbase_emp_table", "person", "stu"]
2.1.2 在hive中创建临时表 (用于load文件中的数据) 并向中间表加载数据
不能直接load进hive所关联hbase的表中
CREATE TABLE emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
row format delimited fields terminated by '\t';
加载数据
hive> load data local inpath '/opt/module/data/emp.txt' into table emp;
Loading data to table default.emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=657, rawDataSize=0]
OK
Time taken: 2.199 seconds
hive>
表数据如下
hive> select * from emp;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
Time taken: 10.556 seconds, Fetched: 14 row(s)
2.1.3 中间表数据导入关联表中
通过insert 将中间表数据导入关联表
hive> insert into table hive_hbase_emp_table select * from emp;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
hive查看数据
Time taken: 158.308 seconds
hive> select * from hive_hbase_emp_table;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
Time taken: 0.239 seconds, Fetched: 14 row(s)
hive>
hbase中查看数据(数据太多此处只显示一部分)
hbase(main):003:0> scan 'hbase_emp_table'
ROW COLUMN+CELL
7369 column=info:deptno, timestamp=1540771708143, value=20
7369 column=info:ename, timestamp=1540771708143, value=SMITH
7369 column=info:hiredate, timestamp=1540771708143, value=1980-12-17
7369 column=info:job, timestamp=1540771708143, value=CLERK
7369 column=info:mgr, timestamp=1540771708143, value=7902
7369 column=info:sal, timestamp=1540771708143, value=800.0
7499 column=info:comm, timestamp=1540771708143, value=300.0
7499 column=info:deptno, timestamp=1540771708143, value=30
7499 column=info:ename, timestamp=1540771708143, value=ALLEN
7499 column=info:hiredate, timestamp=1540771708143, value=1981-2-20
7499 column=info:job, timestamp=1540771708143, value=SALESMAN
7499 column=info:mgr, timestamp=1540771708143, value=7698
7499 column=info:sal, timestamp=1540771708143, value=1600.0
7521 column=info:comm, timestamp=1540771708143, value=500.0
7521 column=info:deptno, timestamp=1540771708143, value=30
7521 column=info:ename, timestamp=1540771708143, value=WARD
7521 column=info:hiredate, timestamp=1540771708143, value=1981-2-22
7521 column=info:job, timestamp=1540771708143, value=SALESMAN
在hbase清空表,hive查询数据显示为空
########################## hbase ##########################
hbase(main):005:0> truncate "hbase_emp_table"
Truncating 'hbase_emp_table' table (it may take a while):
- Disabling table...
- Dropping table...
- Creating table...
0 row(s) in 2.5110 seconds
hbase(main):006:0> scan "hbase_emp_table"
ROW COLUMN+CELL
0 row(s) in 0.0370 seconds
hbase(main):007:0>
########################## hive ##########################
hive (default)> select * from hive_hbase_emp_table
> ;
OK
hive_hbase_emp_table.empno hive_hbase_emp_table.ename hive_hbase_emp_table.job hive_hbase_emp_table.mgr hive_hbase_emp_table.hiredate hive_hbase_emp_table.sal hive_hbase_emp_table.comm hive_hbase_emp_table.deptno
Time taken: 0.256 seconds
hive (default)>
hive关联表不能直接导入数据
直接导入数据到关联表,会报错
hive (default)> load data local inpath '/opt/module/data/emp.txt' into table hive_hbase_emp_table;
FAILED: SemanticException [Error 10101]: A non-native table cannot be used as target for LOAD
hive (default)>
2.2 案例二
hbase中已经存在表hbase_emp_table,在hive中创建一个外部表来关联hbase中的这张表,使之可以借助hive来分析hbase这张表
hbase表仍使用案例一中的hbase_emp_table
1 在hive创建外部表
CREATE EXTERNAL TABLE relevance_hbase_emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");
下图是案例一 与案例二 hive表的对比
2 关联后就可以使用hive函数分析操作
hive (default)> select * from relevance_hbase_emp;
OK
relevance_hbase_emp.empno relevance_hbase_emp.ename relevance_hbase_emp.job relevance_hbase_emp.mgr relevance_hbase_emp.hiredate relevance_hbase_emp.sal relevance_hbase_emp.comm relevance_hbase_emp.deptno
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
Time taken: 0.159 seconds, Fetched: 14 row(s)