大数据之检索elasticsearch快速入门

最新推荐文章于 2024-03-25 01:02:49 发布

江湖侠客

最新推荐文章于 2024-03-25 01:02:49 发布

阅读量405

点赞数

分类专栏：检索数据技术【初学】

本文链接：https://blog.csdn.net/weixin_39868387/article/details/102983169

版权

检索数据技术【初学】专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一、检索

我们的应用经常需要添加检索功能，开源ElasticSearch
是目前全文搜索引擎的首选。他可以快速的存储、搜索和分析海量数据。Spring Boot通过整合Spring Data ElasticSearch为我们提供了非常便捷的检索功能支持；

Elasticsearch是一个分布式搜索服务，提供Restful
API，底层基于Lucene，采用多shard（分片）的方式保证数据安全，并且提供自动resharding的功能，github等大型的站点也是采用了ElasticSearch作为其搜索服务，**

二、概念

以 员工文档 的形式存储为例：一个文档代表一个员工数据。存储数据到 ElasticSearch 的行为叫做索引，但在索引一个文档之前，需要确定将文档存储在哪里。
一个 ElasticSearch 集群可以包含多个索引，相应的每个索引可以包含多个类型。这些不同的类型存储着多个文档，每个文档又有多个属性。
类似关系：
索引-数据库
类型-表文档-表中的记录
属性-列

在这里插入图片描述
2.1 做一个简单的例子索引，发送请求：
首先自己使用Postman工具测试
步骤：
编辑发送请求信息

{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

在这里插入图片描述
发送请求，接收显示的信息

接着，让我们增加更多的员工信息到目录中，比如添加2号员工，步骤同理的如下操作：
添加发送消息的内容：

{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

在这里插入图片描述

接着，让我们增加更多的员工信息到目录中，比如添加2号员工，步骤同理的如下操作：
添加发送消息的内容：

{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

在这里插入图片描述

2.2 目前我们已经在 Elasticsearch 中存储了一些数据，第一个需求是可以检索到单个雇员的数据。

语法：GET /megacorp/employee/1

2.3 还有一个是head请求（检查是否存在的请求风格）
在这里插入图片描述
2.4 DELETE表示删除请求的风格

小结：
将 HTTP 命令由 PUT 改为 GET 可以用来检索文档；
使用 DELETE 命令来删除文档
使用 HEAD 指令来检查文档是否存在
如果想更新已存在的文档，只需再次 PUT

注意：
每次创建PUT请求的风格，它的版本都会增加，并覆盖当前的版本

3、轻量搜索

一个 GET 是相当简单的，可以直接得到指定的文档

语法：

GET /megacorp/employee/_search

轻量搜索显示出来的数据如下：

{
    "took": 599,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "first_name": "Douglas",
                    "last_name": "Fir",
                    "age": 30,
                    "about": "I like to build cabinets",
                    "interests": [
                        "forestry"
                    ]
                }
            }
        ]
    }
}

注意：返回结果不仅告知匹配了哪些文档，还包含了整个文档本身：显示搜索结果给最终用户所需的全部信息。

3.1 接下来，尝试下搜索姓氏为 Smith 的雇员。

为此，我们将使用一个高亮搜索，很容易通过命令行完成。这个方法一般涉及到一个查询字符串（query-string）搜索，因为我们通过一个URL参数来传递查询信

息给搜索接口：
第一种方式：
GET /megacorp/employee/_search?q=last_name:Smith
在这里插入图片描述
第二种方式：
语法：GET /megacorp/employee/_search

{
"query" : {
“match” : {
“last_name” : “Smith”
}
}
}

在这里插入图片描述
轻量搜索显示出来的数据如下：

{
    "took": 124,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0296195,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 1.0296195,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 1.0296195,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}

3.3 更复杂的搜索

现在尝试下更复杂的搜索。同样搜索姓氏为 Smith 的雇员，但这次我们只需要年龄大于 30 的。查询需要稍作调整，使用过滤器 filter ，它支持高效地执行一个结构化查询。

语法：

GET /megacorp/employee/_search {
“query” : {
“bool”: {
“must”: {
“match” : {
“last_name” : “smith”
}
},
“filter”: {
“range” : {
“age” : { “gt” : 30 }
}
}
}
} }

在这里插入图片描述
更复杂的搜索显示出来的数据如下：

{
    "took": 56,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0296195,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 1.0296195,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}

4、全文搜索
截止目前的搜索相对都很简单：单个姓名，通过年龄过滤。现在尝试下稍微高级点儿的全文搜索——一项传统数据库确实很难搞定的任务。

搜索下所有喜欢攀岩（rock climbing）的雇员：

GET /megacorp/employee/_search
{
“query” : {
“match” : {
“about” : “rock climbing”
}
}
}

全文搜索显示出来的数据如下：

{
    "took": 34,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 2.4450345,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 2.4450345,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 0.9795299,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}

5、短语搜索
找出一个属性中的独立单词是没有问题的，但有时候想要精确匹配一系列单词或者短语。比如，我们想执行这样一个查询，仅匹配同时包含 “rock” 和 “climbing” ，并且二者以短语 “rock climbing” 的形式紧挨着的雇员记录。

为此对 match 查询稍作调整，使用一个叫做 match_phrase 的查询：
GET /megacorp/employee/_search

{
“query” : {
“match_phrase” : {
“about” : “rock climbing”
}
}
}

短语搜索显示出来的数据如下：

{
    "took": 48,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 2.4450345,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 2.4450345,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            }
        ]
    }
}

7、高亮搜索

许多应用都倾向于在每个搜索结果中高亮部分文本片段，以便让用户知道为何该文档符合查询条件。在 Elasticsearch
中检索出高亮片段也很容易。

再次执行前面的查询，并增加一个新的 highlight 参数：
GET /megacorp/employee/_search

{
“query” : {
“match_phrase” : {
“about” : “rock climbing”
}
},
“highlight”: {
“fields” : {
“about” : {}
}
} }

高亮搜索显示出来的数据如下：

{
    "took": 146,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 2.4450345,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 2.4450345,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                },
                "highlight": {
                    "about": [
                        "I love to go <em>rock</em> <em>climbing</em>"
                    ]
                }
            }
        ]
    }
}

江湖侠客

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
大数据之检索elasticsearch快速入门

一、检索我们的应用经常需要添加检索功能，开源ElasticSearch是目前全文搜索引擎的首选。他可以快速的存储、搜索和分析海量数据。Spring Boot通过整合Spring Data ElasticSearch为我们提供了非常便捷的检索功能支持；Elasticsearch是一个分布式搜索服务，提供RestfulAPI，底层基于Lucene，采用多shard（分片）的方式保证数据安...
复制链接

扫一扫

专栏目录