作用:
1.将Hive数据导入ES
2.Hive直接使用ES的数据
步骤
一、配置依赖jar
1.临时生效
启动HIVE CLI后,ADD JAR /path/elasticsearch-hadoop-xxx.jar;
或
bin/hive --auxpath=/path/elasticsearch-hadoop-xxx.jar
或
bin/hive -hiveconf hive.aux.jars.path=/path/elasticsearch-hadoop-xxx.jar
2.永久生效
hive-site.xml 添加配置
<property>
<name>hive.aux.jars.path</name>
<value>/path/elasticsearch-hadoop-xxx.jar</value>
<description>A comma separated list (with no spaces) of the jar files</description>
</property>
CDH版可直接配置,将jar放在配置目录,重启HiveServer2即可。参考:https://blog.csdn.net/qq_23146763/article/details/88897243
二、在Hive创建外部表
CREATE EXTERNAL TABLE tmp.artists (
id string,
user_name string,
age string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource' = 'radio/artists',
'es.index.auto.create' = 'true',
"es.mapping.id" = "id",
'es.mapping.names' = 'user_name:user_name_es, age:age_es',
'es.nodes' = 'IP1:9200,IP2:9200,IP3:9200',
"es.net.http.auth.user"="XXX",
"es.net.http.auth.pass"="XXX"
);
es.mapping.names为hive字段和es字段映射,hive只支持小写名,es字段大小写都支持
es.net.http.auth.user和es.net.http.auth.pass是xpack权限控制账号密码,没开启可以不配置。开启了不配置会报错如下:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
三、导入数据
用一张有数据的表导入
INSERT OVERWRITE TABLE tmp.artists
SELECT personal_user_id as id, personal_user_name as user_name ,personal_age as age FROM xxx limit 100;
四、查看数据
通过hive查看
查看es数据
官网:https://www.elastic.co/guide/en/elasticsearch/hadoop/5.5/hive.html