Hive与ES集成过程,错误记录

一、基础准备

1,查看es对应版本,下载版本号一样的jar包, 测试环境用的es7.8.0

下载地址https://jar-download.com/artifacts/org.elasticsearch/elasticsearch-hadoop-hive

2,将下载好的jar包elasticsearch-hadoop-hive-7.8.0.jar上传到hdfs中

 

 

放在存储hive udf的目录下。

3,引入jar包

先查看已经引入的jar包有哪些

list jars;

引入jelasticsearch-hadoop-hive-7.8.0.jar

add jar hdfs://nameservice1/user/hive/udf/elasticsearch-hadoop-hive-7.8.0.jar

创建hive表并导入测试数据

CREATE EXTERNAL TABLE test.dxt_test_to_es (
    order_zhu string,
    order_zi string,
    create_time string,
    jifei_status string,
    fuwufei  string,
    num_id string,
    zhubo_id string,
    neirong_id string,
    zhubo_nick string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dxt_test_to_es/info',
    'es.index.auto.create' = 'false',
    'es.nodes' = 'es.xxxx.com',
    'es.nodes.wan.only'='true',
    'es.read.metadata'='true',
    'es.port'='80',
    'es.mapping.names'='order_zhu:order_zhu, order_zi:order_zi, create_time:create_time, jifei_status:jifei_status, fuwufei:fuwufei, num_id:num_id, zhubo_id:zhubo_id, neirong_id:neirong_id, zhubo_nick:zhubo_nick'
    
);

insert overwrite table test.dxt_test_to_es select order_zhu,order_zi,create_time,jifei_status,fuwufei,num_id,zhubo_id,neirong_id,zhubo_nick from v_tese_order

报错1

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

去linux中使用命令交互方式操作hive,发现是连接es服务超时被拒绝,因为9200被Nginx代理了,不能直接访问9200,通过域名+80端口的方式可以通过。

继续 insert overwrite 

报错2

Task with the most failures(4): 
-----
Task ID:
  task_1602569057109_12652_m_000000

URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1602569057109_12652&tipid=task_1602569057109_12652_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
    at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
    at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
    at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)
    at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
    at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:759)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:147)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:487)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
    ... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.Credentials
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 24 more


20/12/03 13:38:55 ERROR exec.Task: 
Task with the most failures(4): 
-----
Task ID:
  task_1602569057109_12652_m_000000

URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1602569057109_12652&tipid=task_1602569057109_12652_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
    at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
    at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
    at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)
    at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
    at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:759)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:147)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:487)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
    ... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.Credentials
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 24 more
 

应该是没有对应的jar在计算的时候,找到hive安装包下的 /opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar  在Hive交互界面操作

hive> add jar /opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar ;

加入后查看有两个依赖jar包,继续执行insert overwrite 数据到测试表,发现可以导入进去,并且ES中可以看到索引和对应数据

查询hive中ES映射表

报错3

数据转化异常----这是因为初始自动创建时,定义的hive 字段都是string 但是到es中字段的类型发生变化 

20/12/03 15:29:02 ERROR CliDriver: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:164)
	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2196)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:774)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:701)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:97)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
	at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:63)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:438)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:430)
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
	... 13 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
	at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:271)
	at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
	at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
	at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:72)
	at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
	at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
	at org.apache.hadoop.hive.serde2.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:67)
	at org.apache.hadoop.hive.serde2.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:36)
	at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:94)
	... 22 more

修改字段数据类型,保持两边映射 字段类型 对应

两边字段类型对应可以查询到数据,建议开始时设置 'es.index.auto.create' = 'false', 然后手动在es中设置mapping,创建hive表时两边对应起来

 

 

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值