五一期间抽时间学一下es,写了笔记顺便整理发布一下
在Elasticsearch中,文档归属于一种类型(type),而这些类型存在于索引(index)中,我们可以
画一些简单的对比图来类比传统关系型数据库:
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields
Elasticsearch集群可以包含多个索引(indices)(数据库),每一个索引可以包含多个类型
(types)(表),每一个类型包含多个文档(documents)(行),然后每个文档包含多个字段
(Fields)(列)。
介绍几种查询
简单搜索
全部 GET /megacorp/employee/_search
{ "took": 6, "timed_out": false, "_shards": { }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 1, "_source": { "first_name": "Douglas", "last_name": "Fir", "age": 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }
查询
GET /megacorp/employee/_search?q=last_name:Smith
{ "hits": { "total": 2, "max_score": 0.30685282, "hits": [ { "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }
DSL语句查询
查询
GET /megacorp/employee/_search
GET /megacorp/employee/_search { "query" : { "match" : { "last_name" : "Smith" } } }
过滤器 filter
{ "query": { "filtered": { "filter": { "range": { "age": { "gt": 30 } } }, "query": { "match": { "last_name": "smith" } } } } }
全文搜索
短语搜索+高亮
{ "query": { "match_phrase": { "about": "rock climbing" } }, "highlight": { "fields": { "about": { } } } }
{ "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go <em>rock</em> <em>climbing</em>" ] } } ] } }
分析
最后,我们还有一个需求需要完成:允许管理者在职员目录中进行一些分析。 Elasticsearch
有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL
中的 GROUP BY 但是功能更强大。
举个例子,让我们找到所有职员中最大的共同点(兴趣爱好)是什么:
{ "aggs": { "all_interests": { "terms": { "field": "interests" } } } }
{ "aggregations": { "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "forestry", "doc_count": 1 }, { "key": "sports", "doc_count": 1 } ] } } }
我们可以看到两个职员对音乐有兴趣,一个喜欢林学,一个喜欢运动。这些数据并没有被预
先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有
姓"Smith"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可:
{ "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests" } } } }
all_interests 聚合已经变成只包含和查询语句相匹配的文档了:
{ "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "sports", "doc_count": 1 } ] } }
聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄:
{ "aggs": { "all_interests": { "terms": { "field": "interests" }, "aggs": { "avg_age": { "avg": { "field": "age" } } } } } }
{ "all_interests": { "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 } }, { "key": "forestry", "doc_count": 1, "avg_age": { "value": 35 } }, { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ] } }
该聚合结果比之前的聚合结果要更加丰富。我们依然得到了兴趣以及数量(指具有该兴趣的
员工人数)的列表,但是现在每个兴趣额外拥有 avg_age 字段来显示具有该兴趣员工的平均
年龄。
即使你还不理解语法,但你也可以大概感觉到通过这个特性可以完成相当复杂的聚合工作,
你可以处理任何类型的数据。
教程小结
希望这个简短的教程能够很好的描述Elasticsearch的功能。当然这只是一些皮毛,为了保持
简短,还有很多的特性未提及——像推荐、定位、渗透、模糊以及部分匹配等。但这也突出
了构建高级搜索功能是多么的容易。无需配置,只需要添加数据然后开始搜索!