分布式搜索引擎ElasticSearch之高级运用（二）

最新推荐文章于 2022-06-27 14:46:29 发布

麦神-mirson

最新推荐文章于 2022-06-27 14:46:29 发布

阅读量3.2w

点赞数

分类专栏： ElasticSearch 架构设计

本文链接：https://blog.csdn.net/hxx688/article/details/115101271

版权

架构设计同时被 2 个专栏收录

32 篇文章 3 订阅

订阅专栏

ElasticSearch

7 篇文章 1 订阅

订阅专栏

1. 分词查询操作

创建索引：

PUT /movies/_doc/1
{
  "name":"The film, filmed in 2021 & tells the story of children"
}

按分词搜索：

GET /movies/_search
{
  "query": {
    "match": {"name": "story"}
  }
}

通过单个词，可以搜索匹配到结果，采用analyze查看分词信息：

GET /movies/_analyze 
{
  "field": "name",
  "text": "The film, filmed in 2021 & tells the story of children"
}

analyze分词处理流程：

file

分词器的使用：

如果搜索关键词为tell是没有任何结果，这个时候需要采用英文分词器。

#重新创建索引
PUT /movies
{
  "settings":{
      "index":{
        "number_of_shards": 1, 
        "number_of_replicas": 0
      }
    },
  "mappings": {
    "properties": {
      "name":{"type":"text", "analyzer": "english"}
    }
  }
}

重新插入数据，采用关键词tell搜索，可以找到对应的结果：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "The film, filmed in 2021 & tells the story of children"
        }
      }
    ]
  }

通过英文分词器，会进行词干转化。比如一个单词tells，实际词干为tell，都可以进行搜索匹配。

2. 配置TMDB开源电影数据

TMDB是开源的电影网站数据，里面累积数据较多，比较规范化，便于ES的研究和学习。

下载tmdb数据

下载地址

导入工程
file

ESConfig的连接配置类：

@Bean
    public TransportClient getClient(){
        TransportClient transportClient = null;
        try {
            Settings settings = Settings.builder().build();
            transportClient = new PreBuiltTransportClient(settings);
            // ES的连接配置信息, 默认transport传输端口为9300，不是9200
            TransportAddress firstAddress = new TransportAddress(InetAddress.getByName("10.10.20.28"),Integer.parseInt("9300"));
            transportClient.addTransportAddress(firstAddress);
        }catch (Exception e){
            e.printStackTrace();

        }
        return transportClient;
    }

ESController提供导入接口：

@RequestMapping("/importdata")
@ResponseBody
public ResponseEntity importdata() throws IOException {
    ...
    // 索引结构配置信息， 索引名称要配置正确
    bulkRequest.add(new IndexRequest("movies", "_doc", String.valueOf(lineId-1)).source(XContentType.JSON,
                        "title", records[17],
                        "tagline",records[16],
                        "release_date",date,
                        "popularity",records[8],
                        "cast",cast,
                        "overview",records[7]));
    ...
}

通过kibana创建索引结构

PUT /movies
{
	"settings": {
		"number_of_shards": 1,
		"number_of_replicas": 0
	},
	"mappings": {
		"properties": {
			"title": {
				"type": "text",
				"analyzer": "english"
			},
			"tagline": {
				"type": "text",
				"analyzer": "english"
			},
			"release_date": {
				"type": "date",
				"format": "8yyyy/MM/dd||yyyy/M/dd||yyyy/MM/d||yyyy/M/d"
			},
			"popularity": {
				"type": "double"
			},
			"cast": {
				"type": "object",
				"properties": {
					"character": {
						"type": "text",
						"analyzer": "standard"
					},
					"name": {
						"type": "text",
						"analyzer": "standard"
					}
				}
			},
			"overview": {
				"type": "text",
				"analyzer": "english"
			}
		}
	}
}

导入数据

调用接口： http://127.0.0.1:8080/es/importdata

会读取csv文件，自动导入数据。
查看导入结果

通过kibana后台，查看导入的数据：
搜索查询

搜索title为heart的关键字
```
GET /movies/_search
{
    "query":{
        "match":{"title":"heart"}
     }
}
```
能够根据english分词器进行搜索匹配返回所有相关的结果，每个结果都会有对应的_score评分，关键字出现频率越高，或占比越高，则得分越高，优先排在前面。

3. 搜索匹配进阶

or匹配
```
GET /movies/_search
{
	"query": {
		"match": {
			"title": "heart from"
		}
	}
}
```
match搜索实质上就是or关系，分为heart 和from两个关键词进行or关系搜索。

or的最小词匹配控制

GET /movies/_search
{
  "query":{
    "match":{
      "title": {
        "query": "good hearts sea",
        "operator": "or",
        "minimum_should_match": 2
      }
      
    }
  }
}

这里minimum_should_match设定为2，只要出现good hearts 和 hearts sea，都会展示出来。

and匹配

GET /movies/_search
{
  "query":{
    "match":{
      "title": {
        "query": "heart sea",
        "operator": "and"
      }     
    }
  }
}

通过operator属性来标识对应的操作。这个时候搜索出来的title会包含heart和sea两个关键字。

短语查询

如果想直接搜索某个短语，比如：The Good Heart，可以采用match_phrase
```
GET /movies/_search
{
  "query":{
    "match_phrase":{"title":"The Good Heart"}
  }
}
```
会做整个短语的完整匹配，不会再进行拆分匹配。
多字段查询

如果想对多个字段同时查询，可以采用multi_match方式。
```
GET /movies/_search
{
  "query":{
    "multi_match":{
      "query": "good hearts sea",
      "fields": ["title", "overview"]
    }
  }
}
```
查询title和overview两个属性，都包含“good hearts sea”的记录，相比一个属性title的查询，多出更多的记录。

4. Query String查询

可以采用更简便的方式，直接使用AND、OR和NOT操作。

GET /movie/_search
{
  "query":{
    "query_string":{
      "fields":["title"],
      "query":"heart AND sea"      
    }
  }
}

查出title当中既包含heart又包含sea的数据。

GET /movie/_search
{
  "query":{
    "query_string":{
      "fields":["title"],
      "query":"heart OR sea"      
    }
  }
}

查出title当中包含heart或sea的数据。

GET /movie/_search
{
  "query":{
    "query_string":{
      "fields":["title"],
      "query":"heart NOT sea"      
    }
  }
}

查出title当中包含heart但不包含sea的数据。

本文由mirson创作分享，如需进一步交流，请加QQ群：19310171或访问www.softart.cn

麦神-mirson

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
1
评论
分布式搜索引擎ElasticSearch之高级运用（二）

1. 分词查询操作创建索引：PUT /movies/_doc/1{ "name":"The film, filmed in 2021 & tells the story of children"}按分词搜索：GET /movies/_search{ "query": { "match": {"name": "story"} }}通过单个词，可
复制链接

扫一扫