一、基础准备
1,查看es对应版本,下载版本号一样的jar包, 测试环境用的es7.8.0
下载地址https://jar-download.com/artifacts/org.elasticsearch/elasticsearch-hadoop-hive
2,将下载好的jar包elasticsearch-hadoop-hive-7.8.0.jar上传到hdfs中
放在存储hive udf的目录下。
3,引入jar包
先查看已经引入的jar包有哪些
list jars;
引入jelasticsearch-hadoop-hive-7.8.0.jar
add jar hdfs://nameservice1/user/hive/udf/elasticsearch-hadoop-hive-7.8.0.jar
创建hive表并导入测试数据
CREATE EXTERNAL TABLE test.dxt_test_to_es (
order_zhu string,
order_zi string,
create_time string,
jifei_status string,
fuwufei string,
num_id string,
zhubo_id string,
neirong_id string,
zhubo_nick string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dxt_test_to_es/info',
'es.index.auto.create' = 'false',
'es.nodes' = 'es.xxxx.com',
'es.nodes.wan.only'='true',
'es.read.metadata'='true',
'es.port'='80',
'es.mapping.names'='order_zhu:order_zhu, order_zi:order_zi, create_time:create_time, jifei_status:jifei_status, fuwufei:fuwufei, num_id:num_id, zhubo_id:zhubo_id, neirong_id:neirong_id, zhubo_nick:zhubo_nick'
);
insert overwrite table test.dxt_test_to_es select order_zhu,order_zi,create_time,jifei_status,fuwufei,num_id,zhubo_id,neirong_id,zhubo_nick from v_tese_order
报错1
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
去linux中使用命令交互方式操作hive,发现是连接es服务超时被拒绝,因为9200被Nginx代理了,不能直接访问9200,通过域名+80端口的方式可以通过。
继续 insert overwrite
报错2
Task with the most failures(4):
-----
Task ID:
task_1602569057109_12652_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1602569057109_12652&tipid=task_1602569057109_12652_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)
at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:759)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:487)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.Credentials
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
20/12/03 13:38:55 ERROR exec.Task:
Task with the most failures(4):
-----
Task ID:
task_1602569057109_12652_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1602569057109_12652&tipid=task_1602569057109_12652_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/Credentials
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)
at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:759)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:487)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.Credentials
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
应该是没有对应的jar在计算的时候,找到hive安装包下的 /opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar 在Hive交互界面操作
hive> add jar /opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar ;
加入后查看有两个依赖jar包,继续执行insert overwrite 数据到测试表,发现可以导入进去,并且ES中可以看到索引和对应数据
查询hive中ES映射表
报错3
数据转化异常----这是因为初始自动创建时,定义的hive 字段都是string 但是到es中字段的类型发生变化
20/12/03 15:29:02 ERROR CliDriver: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:164)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2196)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:774)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:97)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:63)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:438)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:430)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
... 13 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:271)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:72)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
at org.apache.hadoop.hive.serde2.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:67)
at org.apache.hadoop.hive.serde2.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:36)
at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:94)
... 22 more
修改字段数据类型,保持两边映射 字段类型 对应
两边字段类型对应可以查询到数据,建议开始时设置 'es.index.auto.create' = 'false', 然后手动在es中设置mapping,创建hive表时两边对应起来