hive读取es数据

参考:

  1. hive读写es数据 http://blog.csdn.net/u013063153/article/details/60757307
  2. 官方文档 hive集成es https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html#hive-type-conversion
  3. hive复杂数据类型Array,map,struct使用 http://blog.csdn.net/gamer_gyt/article/details/52169441

有几点说明一下:

  1. 针对es中的数据类型和hive类型的对应,可以在hive中使用复杂数据类型Map,array等存储es中二维json
添加elasticsearch-hadoop-hive-2.2.0-beta1.jar;

add jar file:///home/liuxiaowen/elasticsearch-hadoop-2.2.0-beta1/dist/elasticsearch-hadoop-hive-2.2.0-beta1.jar;

创建外部表

CREATE EXTERNAL TABLE lxw1234_es_tags (
cookieid string,
area string,
media_view_tags string,
interest string ,
userInfo map<string,string>,
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '172.16.212.17:9200,172.16.212.102:9200',
'es.index.auto.create' = 'false',
'es.resource' = 'lxw1234/tags',
'es.read.metadata' = 'true',
'es.mapping.names' = 'cookieid:_metadata._id, area:area, media_view_tags:media_view_tags, interest:interest,userInfo:userInfo');

5903023-c365d69c93954c00.png
image.png
建表语句中es的配置(注意事项)

(参考:https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/configuration.html

一、es.resource

Elasticsearch resource location, where data is read and written to. Requires the format <index>/<type> (relative to the Elasticsearch host/port (see below))).

es.resource = twitter/tweet # index 'twitter', type 'tweet'

es.resource.read (defaults to es.resource)

Elasticsearch resource used for reading (but not writing) data. Useful when reading and writing data to different Elasticsearch indices within the same job. Typically set automatically (except for the Map/Reduce module which requires manual configuration).

es.resource.write(defaults to es.resource)

Elasticsearch resource used for writing (but not reading) data. Used typically for dynamic resource writes or when writing and reading data to different Elasticsearch indices within the same job. Typically set automatically (except for the Map/Reduce module which requires manual configuration).

Note that multiple indices and/or types are allowed only for reading. Use _all/types to search types in all indices or index/ to search all types within index. Do note that reading multiple indices/types typically works only when they have the same structure and only with some libraries. Integrations that require a strongly typed mapping (such as a table like Hive or SparkSQL) are likely to fail.

解释

查询所有索引所有的type
'es.resource' = '_all/types'
查询index索引下所有的type
'es.resource' = 'index/'

二、 es.query (default none)

Holds the query used for reading data from the specified es.resource. By default it is not set/empty, meaning the entire data under the specified index/type is returned. es.query can have three forms:

uri query

using the form ?uri_query, one can specify a query string. Notice the leading ?.

query dsl

using the form query_dsl - note the query dsl needs to start with { and end with } as mentioned here

external resource

if none of the two above do match, elasticsearch-hadoop will try to interpret the parameter as a path within the HDFS file-system. If that is not the case, it will try to load the resource from the classpath or, if that fails, from the Hadoop DistributedCache. The resource should contain either a uri query or a query dsl.

解释

1)以uri 方式查询
es.query = ?q=98E5D2DE059F1D563D8565
2)以dsl 方式查询
es.query = { "query" : { "term" : { "user" : "costinl" } } }
'es.query'='{"query": {"match_all": { }}}',

  1. external resource
    es.query = org/mypackage/myquery.json
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值