问题描述
1、准备数据
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,\N,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,\N,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,\N,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,\N,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,1500,\N,20
7839,KING,PRESIDENT,\N,1981-11-17 00:00:00,5000,\N,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,\N,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,\N,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,\N,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,\N,10
2、在hive中创建t_emp表
CREATE TABLE t_emp(
empno INT,
ename STRING,
job STRING,
mgr INT,
hiredate TIMESTAMP,
sal DECIMAL(7,2),
comm DECIMAL(7,2),
deptno INT)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;
3、加载数据到t_emp表
0: jdbc:hive2://CentOS:10000> load data local inpath '/root/hivedata/t_emp' overwrite into table t_emp;
4、链接HBase在Hbase中建表
hbase(main):005:0> create_namespace 'jiangzz'
0 row(s) in 0.3920 seconds
hbase(main):006:0> create 'jiangzz:t_employee','cf1','cf2'
0 row(s) in 2.6110 seconds
=> Hbase::Table - jiangzz:t_employee
hbase(main):007:0>
5、链接Hive建立hbase的映射表
create external table t_employee(empno INT,
ename STRING,
job STRING,
mgr INT,
hiredate TIMESTAMP,
sal DECIMAL(7,2),
comm DECIMAL(7,2),
deptno INT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf1:name,cf1:job,cf1:mgr,cf1:hiredate,cf1:sal,cf1:comm,cf1:deptno")
TBLPROPERTIES("hbase.table.name" = "jiangzz:t_employee");
6、执行SQL将结果迁移到Hbase中
init.sql
use jiangzz;
insert overwrite table t_employee select empno,ename,job,mgr,hiredate,sal,comm,deptno from t_emp;
[root@CentOS ~]# hive -f init.sql
Logging initialized using configuration in jar:file:/usr/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
OK
Time taken: 0.702 seconds
Query ID = root_20200109213735_d4feac54-be58-46f9-aad1-24a7847d0f42
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1578563483750_0008, Tracking URL = http://CentOS:8088/proxy/application_1578563483750_0008/
Kill Command = /usr/hadoop-2.9.2/bin/hadoop job -kill job_1578563483750_0008
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2020-01-09 21:37:52,914 Stage-0 map = 0%, reduce = 0%
2020-01-09 21:38:26,838 Stage-0 map = 100%, reduce = 0%
Ended Job = job_1578563483750_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1578563483750_0008_m_000000 (and more) from job job_1578563483750_0008
Task with the most failures(4):
-----
Task ID:
task_1578563483750_0008_m_000000
URL:
http://CentOS:8088/taskdetails.jsp?jobid=job_1578563483750_0008&tipid=task_1578563483750_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.setDurability(Lorg/apache/hadoop/hbase/client/Durability;)V
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.setDurability(Lorg/apache/hadoop/hbase/client/Durability;)V
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:142)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:117)
at org.apache.hadoop.hive.ql.io.HivePassThroughRecordWriter.write(HivePassThroughRecordWriter.java:40)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
出现以上问题,原因是Hive和Hbase兼容性问题,需要编译hbase-handler源码。
解决方案
- 创建一个Maven项目,添加以下maven依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.jiangzz</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>1.2.2</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>6</source>
<target>6</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-common</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.4</version>
</dependency>
</dependencies>
</project>
- 将hbase-handler源码拷贝到项目的src目录下
- 执行mvn package指令,打包生成
hive-hbase-handler-1.2.2.jar
,然后将该jar替换HIVE_HOME/lib下的hive-hbase-handler-1.2.2.jar
下。
案例测试
init.sql
use baizhi;
insert overwrite table t_employee select empno,ename,job,mgr,hiredate,sal,comm,deptno from t_emp;
INSERT OVERWRITE LOCAL DIRECTORY '/employee' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE select empno,ename,max(sal) over(partition by deptno) as max ,min(sal) over(partition by deptno) as min,avg(sal) over(partition by deptno) as avg,dense_rank() over(partition by deptno) as rank from t_employee;
[root@CentOS ~]# hive -f init.sql
Logging initialized using configuration in jar:file:/usr/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
OK
Time taken: 0.87 seconds
Query ID = root_20200109214936_f78e52f0-ad7e-456f-86c9-c275f14f2817
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1578563483750_0009, Tracking URL = http://CentOS:8088/proxy/application_1578563483750_0009/
Kill Command = /usr/hadoop-2.9.2/bin/hadoop job -kill job_1578563483750_0009
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2020-01-09 21:49:56,074 Stage-0 map = 0%, reduce = 0%
2020-01-09 21:50:05,901 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.55 sec
MapReduce Total cumulative CPU time: 2 seconds 550 msec
Ended Job = job_1578563483750_0009
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 2.55 sec HDFS Read: 12196 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 550 msec
OK
Time taken: 31.612 seconds