Apache Hive集成
1. 文档概述
本文档在ES与HIVE集群安装成功条件下,参照:
https://www.elastic.co/guide/en/elasticsearch/hadoop/5.6/hive.html#hive
版本信息:
HDP version : HDP-2.6.1.0
ES version : 5.6.0
HIVE version : 1.2.1000
maven version:3.5
2. 环境配置
2.1. Jar临时添加方式
hive> add jar /storage/software/elasticsearch-hadoop-5.6.0.jar;
2.2. Jar永久添加方式
2.2.1. 方式一
$ bin/hive --auxpath=/path/elasticsearch-hadoop.jar
2.2.2. 方式二
$ bin/hive -hiveconf hive.aux.jars.path=/path/elasticsearch-hadoop.jar
2.2.3. 方式三
<property>
<name>hive.aux.jars.path</name>
<value>/path/elasticsearch-hadoop.jar</value>
<description>A comma separated list (with no spaces) of the jar files</description></property>
3. ES数据操作
3.1. 数据添加
curl -XPUT 'hdfs01.edcs.org:9200/twitter/tweet/3?pretty' -H 'Content-Type: application/json' -d' {"name" : "lili","message" : "trying out hive"}'
4. Hive配置
4.1. 创建hive表
hive> create table hive_es_demo(name string,message string) row format delimited fields terminated by '\t' stored as textfile;
hive> load data local inpath '/storage/software/hive_es.txt' into table hive_es_demo;
4.2. 创建es mapping映射
hive>create external table hive_es(name string,message string) stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='hand/edcs','es.mapping.names' = 'name:name1,message=message1','es.nodes'='10.211.55.241', 'es.port'='9200');
4.3. 设置hive驱动默认TEZ,修改为mr
hive> set hive.execution.engine=mr;
*创建mapping关系,无法load文件数据
insert overwrite table hive_es select name,message from hive_es_demo;报错
原因: hive默认执行引擎是TEZ,导致jar冲突 ,改成mr或spark;
hive> set hive.execution.engine=mr;
利用spark-sql跑hive-es未成功;jar冲突