Elasticsearch-Hive

Elasticsearch-Hive

1 下载和安装ES-Hadoop

  1. 下载
  1. 上传jar到Hive客户端所在机器
    解压下载的ZIP包,进入elasticsearch-hadoop/dist,把elasticsearch-hadoop.jar文件拷贝到Hive客户端所在机器。
  2. 安装
    打开hive beeline后,执行:
ADD JAR /path/elasticsearch-hadoop.jar;

或者直接把elasticsearch-hadoop.jar传入$HIVE_HOME/auxlib目录下。(没有调试成功)

2 创建Hive外表

所有相关配置请参考:

2.1 创建示例

创建存储在ES但由Hive管理的外表:

CREATE EXTERNAL TABLE artists (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists',
              'es.index.auto.create' = 'false',
              'es.query' = '{ "query" : { "range" : { "time" : { "gte" : "now+8h-2d/d", "lt" : "now+8h-1d/d", "format" : "yyyy-MM-dd hh:mm:ss" } } } }',
              'es.read.metadata' = 'true',
              'es.mapping.names' = 'id:_metadata._id',
              'es.mapping.date.rich' = 'false',
              'es.nodes' = '192.168.10.1:9200,192.168.10.2:9200,192.168.10.3:9200');
  • EsStorageHandler
    使用EsStorageHandler
  • es.resource
    ES index/type
  • es.index.auto.create
    是否同时自动创建es index
  • es.query
    默认为空,即查询index所有数据。还可以使用?uri_queryquery dsl、外部资源描述
  • es.read.metadata
    默认false。设为true可使得查询结果包括ES doc的元数据,如id、version。
  • es.mapping.names
    Hive-ES字段映射,见2.2章节。我这里是将创建的Hive外表中的id字段映射为了ES doc中的那个id字段。这里可以只把需要映射的字段写出来,其他字段名字相同可以不写。
  • es.mapping.date.rich
    默认true,会创建富类型的Date。设为false则会返回为基本类型(String或long)。当使用Date类型且不是标准的 ISO 8601 格式时可能导致解析出错,可设为false解决。关于ES-Hadoop时间和字段映射请参阅Time/Date mapping
  • es.nodes
    默认localhost。请填写部分ES节点地址,可自动发现集群中其它节点。

2.2 字段映射

默认,Elasticsearch-Hadoop使用Hive表的元数据(字段名和类型)来映射数据到ES。如果想改变映射可使用es.mapping.names,如下:

Hive_field_name:Elasticsearch_field_name,Hive_field_name2:Elasticsearch_field_name2

示例如下:

CREATE EXTERNAL TABLE artists (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists',
            'es.mapping.names' = 'date:@timestamp, url:url_123');

注意:Hive不区分大小写,但ES区分,所以可能导致不正确的查询。为此,Elasticsearch-Hadoop总是将Hive列名转为小写。我们使用的时候必须都用小写,除了Hive命令本身以外。

还有一点就是如果查询错误或不存在的字段会展示为NULL

2.3 字段转换

未开启自动创建index功能时字段转换如下,如果开启则参考:

3 从Hive外表(ES)读取数据

3.1 创建Hive外表

CREATE EXTERNAL TABLE artists (
    id      BIGINT,
    name    STRING,
    links   STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists', 
              'es.query' = '?q=me*');
  • es.query

3.2 读取表

-- stream data from Elasticsearch
SELECT * FROM artists;

3.3 数据从外表写入自己建的Hive表

不要直接使用该外表,应该将数据导入自己建的表后使用。导入方式可以用:

insert overwrite table dw.dwd_artist_data_1d partition (ds='20200303')
select id,regexp_replace(name, '\n|\r', '') from artists;

常见问题

参考文档

更多好文

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
The Hadoop ecosystem is a de-facto standard for processing terra-bytes and peta-bytes of data. Lucene-enabled Elasticsearch is becoming an industry standard for its full-text search and aggregation capabilities. Elasticsearch-Hadoop serves as a perfect tool to bridge the worlds of Elasticsearch and Hadoop ecosystem to get best out of both the worlds. Powered with Kibana, this stack makes it a cakewalk to get surprising insights out of your massive amount of Hadoop ecosystem in a flash. In this book, you'll learn to use Elasticsearch, Kibana and Elasticsearch-Hadoop effectively to analyze and understand your HDFS and streaming data. You begin with an in-depth understanding of the Hadoop, Elasticsearch, Marvel, and Kibana setup. Right after this, you will learn to successfully import Hadoop data into Elasticsearch by writing MapReduce job in a real-world example. This is then followed by a comprehensive look at Elasticsearch essentials, such as full-text search analysis, queries, filters and aggregations; after which you gain an understanding of creating various visualizations and interactive dashboard using Kibana. Classifying your real-world streaming data and identifying trends in it using Storm and Elasticsearch are some of the other topics that we'll cover. You will also gain an insight about key concepts of Elasticsearch and Elasticsearch-hadoop in distributed mode, advanced configurations along with some common configuration presets you may need for your production deployments. You will have “Go production checklist” and high-level view for cluster administration for post-production. Towards the end, you will learn to integrate Elasticsearch with other Hadoop eco-system tools, such as Pig, Hive and Spark.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值