SpringBoot --- 整合Elasticsearch

百世经纶『一页書』

已于 2022-04-19 10:56:50 修改

阅读量7.8k

点赞数 6

分类专栏： Java Springboot 文章标签： spring boot es

于 2022-04-01 13:50:44 首次发布

本文链接：https://blog.csdn.net/weixin_43916074/article/details/123895892

版权

Java 同时被 2 个专栏收录

85 篇文章 1 订阅

订阅专栏

Springboot

24 篇文章 1 订阅

订阅专栏

SpringBoot --- 整合Elasticsearch

整理不易，不喜勿喷。谢谢

SpringBoot — 整合Ldap.
SpringBoot — 整合Spring Data JPA.
SpringBoot — 整合Elasticsearch.
SpringBoot — 整合spring-data-jpa和spring-data-elasticsearch.
SpringBoot — 整合thymeleaf.
SpringBoot — 注入第三方jar包.
SpringBoot — 整合Redis.
Springboot — 整合slf4j打印日志.
Springboot — 整合定时任务，自动执行方法.
Springboot — 配置多数据源,使用JdbcTemplate以及NamedParameterJdbcTemplate.
Sprignboot — 详解pom.xml中build和profile.
SpringBoot — 监控.
SpringBoot — 缓存Cache/Redis.
SpringBoot与Zookeeper.
Git的使用.

1.elasticsearch install

项目	Elasticsearch	Solr
实时索引	不会产生线程阻塞，效能高于solr	会有io阻塞
动态添加数据	对效能没有影响	效能会变得低下
分布式	本身自带分布式	利用zookeeper进行分布式管理
数据格式	仅支持json	xml,json,csv等等
地位	更适合新兴的实时搜索应用	传统应用的有力解决方案

官网地址：elasticsearch官网.最好下载一些低版本的，高版本整合会报警

https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.2-windows-x86_64.zip.
9200 是ES对外部RESTFUL接口
9300 是ES内部使用的端口

1.1 windows版本下载，解压

在这里插入图片描述

1.2 启动

在这里插入图片描述

1.3 访问

在这里插入图片描述

1.4 关键字介绍

Name	Desc
index	相当于Mysql中的一个库
document	相当于Mysql中的一行数据
field	相当于Mysql中的column
shards	分片存储
replicas	进行备份

1.Doc中的元数据

{
	"_id": "1",
	"_index": "lsp",
	"_score": 1,
	-"_source": {
		"age": 30,
		"desc": "皮肤黑、武器长、性格直",
		"from": "gu",
		"name": "顾老二",
		-"tags": [
			"黑",
			"长",
			"直"
		]
}

各部分的含义：

_index：文档所属的索引名
_type：文档所属的类型名
_id：文档唯一标识
_source：文档的原始JSON数据
@version：文档的版本信息（可用于并发搜索时，解决文档冲突）
_score：相关性打分（根据检索结果打分）

2.index索引

每个索引都有自己的Mapping定义，用于包含所有的文档字段名和字段类型。
 * Shard体现物理空间的概念
 * 索引中的数据分散在Shard上
 * Mapping定义文档字段的类型
 * Setting定义不同的数据分布

3.type

5.x及以前版本一个index有一个或者多个type
6.X版本一个index只有一个type
7.X版本移除了type，type相关的所有内容全部变成Deprecated，为了兼容升级和过渡，所有的7.X版本es数据写入后type字段都默认被置为_doc
8.X版本完全废弃type

2.elasticsearch ui

2.1 elasticsearch-head

下载

安装node.js
安装教程https://blog.csdn.net/weixin_43916074/article/details.
npm -v
node -v
安装grunt
npm install -g grunt-cli
grunt -v
下载elasticsearch-head,安装,启动
下载地址：https://github.com/mobz/elasticsearch-head.
安装：cd到此文件夹，然后 npm install

启动：npm run start/grunt server
界面比较老旧，不时尚

在这里插入图片描述

2.2 elasticHD

1.下载

下载地址:https://github.com/360EntSecGroup-Skylar/ElasticHD/releases.

在这里插入图片描述

2.启动
可以直接双击启动
也可以cd到安装目录，执行 ElasticHD -p 127.0.0.1:9800

在这里插入图片描述

2.3 kibana

下载地址：https://www.elastic.co/start.
切记：要和上面的Elasticsearch版本匹配
在这里插入图片描述

鼠标放在windows上，会显示下载地址，直接修改版本就好了

https://artifacts.elastic.co/downloads/kibana/kibana-7.15.2-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/kibana/kibana-7.16.2-windows-x86_64.zip

1.双击Kibana.bat启动
默认对应elasticsearch:9200

2.访问 http://localhost:5601，输入前面的密码

如何进行监控：Kibana监控Es Cluster.

3.Springboot整合Elasticsearch

3.1 官方文档

链接: Spring Data Elasticsearch - Reference Documentation.

在这里插入图片描述

3.2 依赖

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

ElasticsearchRestTemplate封装了RestHighLevelClient，源码如下👇

public class ElasticsearchRestTemplate extends AbstractElasticsearchTemplate {
    private static final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchRestTemplate.class);
    private final RestHighLevelClient client;
    private final ElasticsearchExceptionTranslator exceptionTranslator;

    public ElasticsearchRestTemplate(RestHighLevelClient client) {
        Assert.notNull(client, "Client must not be null!");
        this.client = client;
        this.exceptionTranslator = new ElasticsearchExceptionTranslator();
        this.initialize(this.createElasticsearchConverter());
    }

    public ElasticsearchRestTemplate(RestHighLevelClient client, ElasticsearchConverter elasticsearchConverter) {
        Assert.notNull(client, "Client must not be null!");
        this.client = client;
        this.exceptionTranslator = new ElasticsearchExceptionTranslator();
        this.initialize(elasticsearchConverter);
    }
}

3.3 properties

spring.data.elasticsearch.client.reactive.endpoints=127.0.0.1:9200
#没有这个index，就创建
spring.data.elasticsearch.repositories.enabled=true

~~spring.data.elasticsearch.cluster-nodes=127.0.0.1:9300~~

3.4 代码

@Document表示这是一个Elasticsearch Data，
indexName对应Elasticsearch Index
type对应Elasticsearch type

@Document(indexName = "product"，type = "article")
@Data
public class Article {
    @Id
    private String id;
    private String title;
    @Field(type = FieldType.Nested, includeInParent = true)
    private List<Author> authors;
}

@Data
public class Author {
    private String name;
}

@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article,String> {

    //下面的这两个查询的作用是一样的。一个采用默认的实现方式，一个采用自定义的实现方式
    Page<Article> findByAuthorsName(String name, Pageable pageable);
    
    @Query("{\"bool\": {\"must\": [{\"match\": {\"authors.name\": \"?0\"}}]}}")
    Page<Article> findByAuthorsNameUsingCustomQuery(String name, Pageable pageable);

    //搜索title字段
    Page<Article> findByTitleIsContaining(String word,Pageable pageable);
    
    Page<Article> findByTitle(String title,Pageable pageable);
}

    @Autowired
    private ArticleRepository articleRepository;
    @Autowired
    private ElasticsearchRestTemplate elasticsearchRestTemplate;

    //检查相应的索引是否存在，如果spring.data.elasticsearch.repositories.enabled=True,则会自动创建索引
    private boolean checkIndexExists(Class<?> cls){
        boolean isExist = elasticsearchRestTemplate.indexOps(cls).exists();
        //获取索引名
        String indexName = cls.getAnnotation(Document.class).indexName();
        System.out.printf("index %s is %s\n", indexName, isExist ? "exist" : "not exist");
        return isExist;
    }
    @Test
    void test() {
        checkIndexExists(Article.class);
    }


    @Test
     void save(){
        Article article = new Article();
        articel.setTitle("Spring Data Elasticsearch");
        article.setAuthors(asList(new Author("LaoAlex"),new Author("John")));
        articleRepository.save(article);

        article = new Article();
        articel.setTitle("Spring Data Elasticsearch2");
        article.setAuthors(asList(new Author("LaoAlex"),new Author("King")));
        articleRepository.save(article);

        article = new Article();
        articel.setTitle("Spring Data Elasticsearch3");
        article.setAuthors(asList(new Author("LaoAlex"),new Author("Bill")));
        articleRepository.save(article);
    }		
   
    @Test
    void queryAuthorName() throws JsonProcessingException {
        Page<Article> articles = articleRepository.findByAuthorsName("LaoAlex", PageRequest.of(0,10));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);
    }

    //使用自定义查询
    @Test
    void queryAuthorNameByCustom() throws JsonProcessingException {
        Page<Article> articles = articleRepository.findByAuthorsNameUsingCustomQuery("John",PageRequest.of(0,10));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);
    }

    //使用Template进行关键字查询
    @Test
    void queryTileContainByTemplate() throws JsonProcessingException {
        Query query = new NativeSearchQueryBuilder().withFilter(regexpQuery("title",".*elasticsearch2.*")).build();
        SearchHits<Article> articles = elasticsearchRestTemplate.search(query, Article.class, IndexCoordinates.of("product"));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);
    }


    @Test
    void update() throws JsonProcessingException {
        Page<Article> articles = articleRepository.findByTitle("Spring Data Elasticsearch",PageRequest.of(0,10));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);

        Article article = articles.getContent().get(0);
        System.out.println(article);
        article.setAuthors(null);
        articleRepository.save(article);
    }


    @Test
    void delete(){
        Page<Article> articles = articleRepository.findByTitle("Spring Data Elasticsearch",PageRequest.of(0,10));
        Article article = articles.getContent().get(0);
        articleRepository.delete(article);
    }

3.5 报警

1.no id property found for class

//报警就是es用来封装的实体类出了问题，两个办法解决
//1.将主键栏位改为id
@Data
@Document(indexName = "spring.student")
public class Student {

    private int id;
    private String stuName;
    private String stuAddress;
    private String gender;
}


//2.如果主键栏位不是id，给主键栏位添加注解@Id
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;

@Data
@Document(indexName = "spring.student")
public class Student {

    @Id
    private int stuId;
    private String stuName;
    private String stuAddress;
    private String gender;
}

4.集群搭建

4.1 修改配置文件

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 三台都是这个名字
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 集群中节点名称 （3个节点以此为：node-1，node-2，node-3）
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 逐一修改三台的端口,分别是(9201 9301) ,(9202 9302) ,(9203 9303)
http.port: 9201
transport.tcp.port: 9301
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#三个都一样，node-1为主节点
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#
# ---------------------------------- Security ----------------------------------
#
#                                 *** WARNING ***
#
# Elasticsearch security features are not enabled by default.
# These features are free, but require configuration changes to enable them.
# This means that users don’t have to provide credentials and can get full access
# to the cluster. Network connections are also not encrypted.
#
# To protect your data, we strongly encourage you to enable the Elasticsearch security features. 
# Refer to the following documentation for instructions.
#
# https://www.elastic.co/guide/en/elasticsearch/reference/7.16/configuring-stack-security.html

#allow origin
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization

4.2 还可以修改密码（此步骤没什么用）

都说默认密码如下：我是登不进去

username=elastic
password=changeme

cd到bin目录，找到elasticsearch-setup-passwords(或者直接在地址栏cmd)
D:\Tools\es_cluster\elasticsearch-7.16.2\bin>elasticsearch-setup-passwords interactive

在这里插入图片描述

噼里啪啦一顿改，必须全都要改

elas123

在这里插入图片描述

4.3 启动三个节点,查看状态

1.节点启动成功，并不代表集群成功

2.呼叫下面的api，查看集群状态

http://localhost:9203/_cat/health?v

3.查看集群状态

http://ip:port/_cluster/health
http://ip:port/_cat/nodes
http://ip:port/ _cat/shards

GET http://localhost:9203/_cluster/health

{
  "cluster_name": "my-application",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 8,
  "active_shards": 16,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100.0
}

颜色的含义：

green - 主分片与副本都正常分配
yellow - 主分片全部分配，有副本分片未能正常分配
red - 有主分片未能分配

4.4 启动kibana，监控

1.修改配置文件

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "127.0.0.1"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# Specifies the public URL at which Kibana is available for end users. If
# `server.basePath` is configured this URL should end with the same basePath.
#server.publicBaseUrl: ""

# The maximum payload size in bytes for incoming server requests.
#server.maxPayload: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
#elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.hosts: ["http://localhost:9201","http://localhost:9202","http://localhost:9203"]

2.启动

此处缺一张图片kibana

5.elasticsearch api

造数据

PUT users/_doc/1
{
  "name":"张飞",
  "age":30,
  "from": "China",
  "desc": "皮肤黑、武器重、性格直",
  "tags": ["黑", "重", "直"]
}

PUT users/_doc/2
{
  "name":"赵云",
  "age":18,
  "from":"China",
  "desc":"帅气逼人，一身白袍",
  "tags":["帅", "白"]
}

PUT users/_doc/3
{
  "name":"关羽",
  "age":22,
  "from":"England",
  "desc":"大刀重，骑赤兔马，胡子长",
  "tags":["重", "马","长"]
}


PUT users/_doc/4
{
  "name":"刘备",
  "age":29,
  "from":"Child",
  "desc":"大耳贼，持双剑，懂谋略",
  "tags":["剑", "大"]
}

PUT users/_doc/5
{
  "name":"貂蝉",
  "age":25,
  "from":"England",
  "desc":"闭月羞花，沉鱼落雁",
  "tags":["闭月","羞花"]
}

5.1 CURD之Create

Notice：当执行PUT命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据
下面是两种创建方法

//当执行PUT命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据
POST users/_doc
{
  "user": "Mike",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out kibana"
}
post的id会是随机的，建议还是下面的put好


PUT users/_doc/1
{
  "user": "Jack",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out Elasticsearch"
}

PUT users/_doc/2
{
  "user": "Ludy",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out Elasticsearch"
}

查询某条数据x
GET users/_doc/x

5.2 CURD之Update

POST users/_doc/3/_update
{
  "doc": {
    "post_date": "2020-10-24T14:39:30",
    "message": "trying out Elasticsearch"
  }
}

5.3 CURD之Delete

DELETE users/_doc/4

5.4 CURD之Retrieve

使用elasticHD进行查询，Demo：

索引spring.student
索引spring.test
type默认都是doc

GET /Spring.student/_search
此时查询全部

GET /Spring.student/_search?q=id:1
此时查询一个index

GET /spring.student,spring.test/_search?q=id:1
此时查询多个index

5.5 match查询

GET users/_doc/_search
{
  "query": {
    "match": {
      "post_date": "2020-10-24T14:39:30"
    }
  }
}

5.6 term查询

GET users/_doc/_search
{
  "query": {
    "term": {
      "t1": "Beautiful girl!"
    }
  }
}

5.6 排序查询

可排序的属性

数字
日期

GET users/_doc/_search
{
  "query": {
    "match": {
      "post_date": "2020-10-24T14:39:30"
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

5.7 分页查询

GET users/_doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 1
}

6.Elasticsearch之布尔查询

关键字	代表的含义
must	and
should	or
must_not	not
filter	与must组合使用
range	条件筛选范围。
gt	大于，相当于关系型数据库中的>。
gte	大于等于，相当于关系型数据库中的>=。
lt	小于，相当于关系型数据库中的<。
lte	小于等于，相当于关系型数据库中的<=。

6.1 must关键字查询

//一个条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ]
    }
  }
}

//两个条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "age": 30
          }
        }
      ]
    }
  }
}

6.2 should关键字查询

GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "闭月"
          }
        }
      ]
    }
  }
}

6.3 must_not关键字查询

//三个条件都满足
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "可爱"
          }
        },
        {
          "match": {
            "age": 18
          }
        }
      ]
    }
  }
}

6.4 filter关键字查询

//要查询from为gu，age大于25的数据怎么查
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gt": 25
          }
        }
      }
    }
  }
}

7.Elasticsearch之查询结果过滤

//所有的结果中，我只需要查看name和age两个属性
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "name": "顾老二"
    }
  },
  "_source": ["name", "age"]
}

8.Elasticsearch之高亮查询

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "name": "石头"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

使用b标签自定义高亮

GET lqz/chengyuan/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "highlight": {
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": {
      "from": {}
    }
  }
}

9.Elasticsearch之聚合查询

聚合函数查询

avg
max
min
sum

聚合函数，其语法被封装在aggs中，而my_xxx则是为查询结果起个别名，封装了计算出的值

  "aggregations" : {
    "my_avg" : {
      "value" : 27.0
    }
  }
  "aggregations" : {
    "my_max" : {
      "value" : 30.0
    }
  }
  .......

9.1 avg

GET users/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "size": 0, 
  "_source": ["name", "age"]
}

9.2 max

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  },
  "size": 0
}

9.3 min

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  },
  "size": 0
}

9.4 sum

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  },
  "size": 0
}

9.5 range分组查询

GET lqz/_doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}

两个条件，即分组，又要求平均值

GET lqz/_doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "my_avg": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

10.Elasticsearch之Mapping & Dynamic Mapping

10.1 mapping

GET index_name/_mapping

//先来感受一下
PUT users/_doc/1
{
  "user": "Jack",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out Elasticsearch"
}


GET users/_mapping
{
  "users" : {
    "mappings" : {
      "properties" : {
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "post_date" : {
          "type" : "date"
        },
        "user" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

每个索引都有一个映射类型（这话必须放在elasticsearch6.x版本后才能说)
映射类型有：

元字段（meta-fields）：元字段用于自定义如何处理文档关联的元数据，例如包括文档的_index、_type、_id和_source字段。
字段或属性（field or properties）：映射类型包含与文档相关的字段或者属性的列表。

字段的 mapping 可以设置很多参数，如下：

analyzer：指定分词器，只有 text 类型的数据支持。
enabled：如果设置成 false，表示数据仅做存储，不支持搜索和聚合分析（数据保存在 _source 中）。
默认值为 true。
index：字段是否建立倒排索引。
如果设置成 false，表示不建立倒排索引（节省空间），同时数据也无法被搜索，但依然支持聚合分析，数据也会出现在 _source 中。
默认值为 true。
norms：字段是否支持算分。
如果字段只用来过滤和聚合分析，而不需要被搜索（计算算分），那么可以设置为 false，可节省空间。
默认值为 true。
doc_values：如果确定不需要对字段进行排序或聚合，也不需要从脚本访问字段值，则可以将其设置为 false，以节省磁盘空间。
默认值为 true。
fielddata：如果要对 text 类型的数据进行排序和聚合分析，则将其设置为 true。
默认为 false。
store：默认值为 false，数据存储在 _source 中。
默认情况下，字段值被编入索引以使其可搜索，但它们不会被存储。这意味着可以查询字段，但无法检索原始字段值。
在某些情况下，存储字段是有意义的。例如，有一个带有标题、日期和非常大的内容字段的文档，只想检索标题和日期，而不必从一个大的源字段中提取这些字段。
boost：可增强字段的算分。
coerce：是否开启数据类型的自动转换，比如字符串转数字。
默认是开启的。
dynamic：控制 mapping 的自动更新，取值有 true，false，strict。
eager_global_ordinals
fields：多字段特性。
让一个字段拥有多个子字段类型，使得一个字段能够被多个不同的索引方式进行索引。
copy_to
format
ignore_above
ignore_malformed
index_options
index_phrases
index_prefixes
meta
normalizer
null_value：定义 null 的值。
position_increment_gap
properties
search_analyzer
similarity
term_vector

10.2 dynamic mapping

Dynamic Mapping的机制

我们无需手动定义Mappings。ES会自动根据文档信息，推算出字段的类型。
但是有时候会推算出不对，例如地理位置信息
当类型如果设置不对时，会导致一些功能无法正常运行，例如Range查询。

动态映射 dynamic mapping
静态映射 explicit mapping
严格映射 strict mappings

1 dynamic mapping

//创建索引nolan
PUT nolan
{
  "mappings": {
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
  }
}

//查询索引nolan
GET nolan
{
  "nolan" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "nolan",
        "creation_date" : "1650261517356",
        "number_of_replicas" : "1",
        "uuid" : "dXUnwua2TDCI2K9hcSL98A",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

//插入数据
PUT nolan/_doc/1
{
  "name": "小黑",
  "age": 18,
  "sex": "不详"
}

//查询索引
{
  "nolan" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        },
        "sex" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "nolan",
        "creation_date" : "1650261517356",
        "number_of_replicas" : "1",
        "uuid" : "dXUnwua2TDCI2K9hcSL98A",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

上面的例子，你会发现：
elasticsearch帮我们动态的新增了一个sex的映射
elasticsearch默认是允许添加新的字段的，也就是dynamic：true。

//其实创建索引的时候，是这样的
PUT nolan
{
  "mappings": {
      "dynamic":true,
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
  }
}

2 explicit mapping

将dynamic值设置为false

//创建索引nolan1
PUT nolan1
{
  "mappings": {
    "dynamic": "false",
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

//插入数据
PUT nolan1/_doc/1
{
  "name": "小黑",
  "age":18
}
PUT nolan1/_doc/2
{
  "name": "小白",
  "age": 16,
  "sex": "不详"
}

//查询mapping
GET nolan1
{
  "nolan1" : {
    "aliases" : { },
    "mappings" : {
      "dynamic" : "false",
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "nolan1",
        "creation_date" : "1650262553126",
        "number_of_replicas" : "1",
        "uuid" : "OuUWSsQ-SUaGr2ged3FRbQ",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

可以看到elasticsearch并没有为新增的sex建立映射关系
当elasticsearch察觉到有新增字段时，因为dynamic:false的关系
会忽略该字段，但是仍会存储该字段。

3 strict mapping

将dynamic的状态改为strict

//创建索引nolan2
PUT nolan2
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

//插入数据,当执行第二笔的时候会报警
PUT nolan1/_doc/1
{
  "name": "小黑",
  "age":18
}
PUT nolan1/_doc/2
{
  "name": "小白",
  "age": 16,
  "sex": "不详"
}

遇到新字段，就会抛出异常

4 小结

Name	Setting	Value
动态映射	dynamic: true	动态添加新的字段（或缺省）
静态映射	dynamic: false	忽略新的字段。在原有的映射基础上，当有新的字段时，不会主动的添加新的映射关系，只作为查询结果出现在查询中。
严格模式	dynamic: strict	如果遇到新的字段，就抛出异常

一般静态映射用的较多。就像HTML的img标签一样,你可以在需要的时候添加id或者class属性。

10.3 对象属性

//属性嵌套
PUT noaln2/_doc/1
{
  "name":"tom",
  "age":18,
  "info":{
    "addr":"北京",
    "tel":"10010"
  }
}

PUT noaln2/_doc/21
{
  "name":"jim",
  "age":21,
  "info":{
    "addr":"东莞",
    "tel":"10086"
  }
}

//创建索引nolan2
PUT nolan2
{
  "mappings": {
   "dynamic": false,
   "properties": {
     "name": {
       "type": "text"
     },
     "age": {
       "type": "text"
     },
     "info": {
       "properties": {
         "addr": {
           "type": "text"
         },
         "tel": {
           "type" : "text"
         }
       }
     }
   }
  }
}

GET nolan2/_doc/_search
{
  "query": {
    "match": {
      "info.tel": "10086"
    }
  }
}

10.4 控制当前字段是否被索引

关键字index

age属性不会被索引

PUT nolan3
{
  "mappings": {
     "dynamic": false,
     "properties": {
       "name": {
         "type": "text",
         "index": true
       },
       "age": {
         "type": "long",
         "index": false
       }
     }
  }
}

10.5 对Null值实现搜索

1.Keyword类型支持设定 null_value

PUT users
{
  "mappings" : {
	  "properties" : {
	    "firstName" : {
	     "type" : "text"
	    },
	    "lastName" : {
	     "type" : "text"
	    },
	    "mobile" : {
	     "type" : "keyword",
	     "null_value": "NULL"
	    }
	   }
  }
}

2.ignore_above

//创建索引
PUT nolan
{
  "mappings": {
      "properties":{
        "t1":{
          "type":"keyword",
          "ignore_above": 5
        },
        "t2":{
          "type":"keyword",
          "ignore_above": 10 ①
        }
      }
  }
}

//插入数据
PUT nolan/_doc/1
{
  "t1":"elk",         ②
  "t2":"elasticsearch"     ③
}

//查询④
GET nolan/_doc/_search
{
  "query":{
    "term": {
      "t1": "elk"
    }
  }
}

//查询⑤
GET nolan/_doc/_search
{
  "query": {
    "term": {
      "t2": "elasticsearch"
    }
  }
}

该字段将忽略任何超过10个字符的字符串
文档已成功建立索引，也就是说能被查询，并且有结果返回
该字段将不会建立索引，以该字段作为查询条件，将不会有结果返回。
有结果返回。
则将不会有结果返回，因为t2字段对应的值长度超过了ignove_above设置的值。

11.elasticsearch之setting

设置主、复制分片

PUT nolan
{
  "mappings": {
	"properties": {
	   "name": {
	     "type": "text"
	   }
    }
  }, 
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 5
  }
}

number_of_shards是主分片数量（每个索引默认5个主分片）
number_of_replicas是复制分片，默认一个主分片搭配一个复制分片。

12.elasticsearch字段的数据类型

简单类型*
Numeric
Boolean
Date
Text
Keyword
Binary
等等

复杂类型

Object
Arrays
Nested：一种对象数据类型。
Join：为同一索引中的文档定义父/子关系。

特殊类型

Geo-point
Geo-shape
Percolator

13.cluster node

1.Master eligible nodes 和Master node
每个节点启动后，默认就是Master eligible节点
Master-eligible可以参加选主进程，成为Master节点
当第一个节点启动时，它会将自己选举成Master节点

只有Master节点可以修改集群的状态信息
集群状态（Cluster State）,维护了一个集群中，必要的信息

所有的节点信息
所有的索引和其相关的Mapping与Setting信息
分片的路由信息

2.Data Node & Coordinationg Node

Data Node
可以保存数据的节点。负责保存分片数据。在数据的扩展上起到了至关重要的作用
Coordinationg Node
负责接收Client的请求，将请求分发到合适的节点，最终把结果汇集到一起

3.分片(Primary Shard & Replica Shard)

主分片
用以解决数据水平扩展的问题，通过主分片，可以将数据分布到集群内的所有节点上。
一个分片是一个运行的Lucene实例
主分片数在索引创建时指定，后续不允许修改，除非Reindex
副本
用于解决数据高可用的问题。是主分片的拷贝
副本分片数，可以动态调整
增加副本数，还可以在一定程度上提高服务的可用性（读取的吞吐）

13.Analyzer进行分词

数据被发送到elasticsearch后,会进行的一系列操作

字符过滤：使用字符过滤器转变字符。
文本切分为分词：将文本（档）分为单个或多个分词。
分词过滤：使用分词过滤器转变每个分词。
分词索引：最终将分词存储在Lucene倒排索引中

13.1 分析器

在elasticsearch中，一个分析器可以包括：

可选的字符过滤器
一个分词器
0个或多个分词过滤器

1. 标准分析器：standard analyzer

标准分析器（standard analyzer）：是elasticsearch的默认分析器，该分析器综合了大多数欧洲语言来说合理的默认模块，包括标准分词器、标准分词过滤器、小写转换分词过滤器和停用词分词过滤器。

POST _analyze
{
  "analyzer": "standard",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "that",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

2. 简单分析器：simple analyzer

简单分析器（simple analyzer）：简单分析器仅使用了小写转换分词，这意味着在非字母处进行分词，并将分词自动转换为小写。这个分词器对于亚种语言来说效果不佳，因为亚洲语言不是根据空白来分词的，所以一般用于欧洲言中

POST _analyze
{
  "analyzer": "simple",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "that",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

3. 空白分析器：whitespace analyzer

空白格分析器（whitespace analyzer）：这玩意儿只是根据空白将文本切分为若干分词，真是有够偷懒！

POST _analyze
{
  "analyzer": "whitespace",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be,",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "————",
      "start_offset" : 40,
      "end_offset" : 44,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 11
    }
  ]
}

4. 停用词分析器：stop analyzer

停用词分析（stop analyzer）和简单分析器的行为很像，只是在分词流中额外的过滤了停用词

POST _analyze
{
  "analyzer": "stop",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

5. 关键词分析器：keyword analyzer

关键词分析器（keyword analyzer）将整个字段当做单独的分词，如无必要，我们不在映射中使用关键词分析器。

POST _analyze
{
  "analyzer": "keyword",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "To be or not to be,  That is a question ———— 莎士比亚",
      "start_offset" : 0,
      "end_offset" : 49,
      "type" : "word",
      "position" : 0
    }
  ]
}

6. 模式分析器：pattern analyzer

模式分析器（pattern analyzer）允许我们指定一个分词切分模式。但是通常更佳的方案是使用定制的分析器，组合现有的模式分词器和所需要的分词过滤器更加合适。

PUT pattern_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_email_analyzer":{
          "type":"pattern",
          "pattern":"\\W|_",
          "lowercase":true
        }
      }
    }
  }
}

POST pattern_test/_analyze
{
  "analyzer": "my_email_analyzer",
  "text": "John_Smith@foo-bar.com"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "john",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "smith",
      "start_offset" : 5,
      "end_offset" : 10,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "foo",
      "start_offset" : 11,
      "end_offset" : 14,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "bar",
      "start_offset" : 15,
      "end_offset" : 18,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "com",
      "start_offset" : 19,
      "end_offset" : 22,
      "type" : "word",
      "position" : 4
    }
  ]
}

7. 语言和多语言分析器：chinese

elasticsearch为很多世界流行语言提供良好的、简单的、开箱即用的语言分析器集合：阿拉伯语、亚美尼亚语、巴斯克语、巴西语、保加利亚语、加泰罗尼亚语、中文、捷克语、丹麦、荷兰语、英语、芬兰语、法语、加里西亚语、德语、希腊语、北印度语、匈牙利语、印度尼西亚、爱尔兰语、意大利语、日语、韩国语、库尔德语、挪威语、波斯语、葡萄牙语、罗马尼亚语、俄语、西班牙语、瑞典语、土耳其语和泰语。

POST _analyze
{
  "analyzer": "chinese",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

也可以是别语言：

POST _analyze
{
  "analyzer": "french",
  "text":"Je suis ton père"
}
POST _analyze
{
  "analyzer": "german",
  "text":"Ich bin dein vater"
}

8. 雪球分析器：snowball analyzer

雪球分析器（snowball analyzer）除了使用标准的分词和分词过滤器（和标准分析器一样）也是用了小写分词过滤器和停用词过滤器，除此之外，它还是用了雪球词干器对文本进行词干提取。

POST _analyze
{
  "analyzer": "snowball",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

13.2 字符过滤器

Name	Value
HTML字符过滤器	HTML Strip Char Filter
映射字符过滤器	Mapping Char Filter
模式替换过滤器	Pattern Replace Char Filter

1. HTML字符过滤器

HTML字符过滤器（HTML Strip Char Filter）从文本中去除HTML元素。

POST _analyze
{
  "tokenizer": "keyword",
  "char_filter": ["html_strip"],
  "text":"<p>I&apos;m so <b>happy</b>!</p>"
}

//结果如下
{
  "tokens" : [
    {
      "token" : """

I'm so happy!

""",
      "start_offset" : 0,
      "end_offset" : 32,
      "type" : "word",
      "position" : 0
    }
  ]
}

2. 映射字符过滤器

映射字符过滤器（Mapping Char Filter）接收键值的映射，每当遇到与键相同的字符串时，它就用该键关联的值替换它们。

PUT pattern_test4
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
          "tokenizer":"keyword",
          "char_filter":["my_char_filter"]
        }
      },
      "char_filter":{
          "my_char_filter":{
            "type":"mapping",
            "mappings":["刘备 => 666","关羽 => 888"]
          }
        }
    }
  }
}

POST pattern_test4/_analyze
{
  "analyzer": "my_analyzer",
  "text": "刘备爱惜关羽，可是后来关羽大意失荆州"
}

//结果如下
{
  "tokens" : [
    {
      "token" : "666爱惜888，可是后来888大意失荆州",
      "start_offset" : 0,
      "end_offset" : 19,
      "type" : "word",
      "position" : 0
    }
  ]
}

3. 模式替换过滤器

模式替换过滤器（Pattern Replace Char Filter）使用正则表达式匹配并替换字符串中的字符。但要小心你写的抠脚的正则表达式。因为这可能导致性能变慢！

PUT pattern_test5
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1_"
        }
      }
    }
  }
}

POST pattern_test5/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"
}

//结果如下
{
  "tokens" : [
    {
      "token" : "My",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "credit",
      "start_offset" : 3,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "card",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "is",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "123_456_789",
      "start_offset" : 18,
      "end_offset" : 29,
      "type" : "<NUM>",
      "position" : 4
    }
  ]
}

13.3 分词器

由于elasticsearch内置了分析器，它同样也包含了分词器。分词器，顾名思义，主要的操作是将文本字符串分解为小块，而这些小块这被称为分词token。

1.标准分词器：standard tokenizer

标准分词器（standard tokenizer）是一个基于语法的分词器，对于大多数欧洲语言来说还是不错的，它同时还处理了Unicode文本的分词，但分词默认的最大长度是255字节，它也移除了逗号和句号这样的标点符号。

POST _analyze
{
  "tokenizer": "standard",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

2. 关键词分词器：keyword tokenizer

关键词分词器（keyword tokenizer）是一种简单的分词器，将整个文本作为单个的分词，提供给分词过滤器，当你只想用分词过滤器，而不做分词操作时，它是不错的选择。

POST _analyze
{
  "tokenizer": "keyword",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To be or not to be,  That is a question ———— 莎士比亚",
      "start_offset" : 0,
      "end_offset" : 49,
      "type" : "word",
      "position" : 0
    }
  ]
}

3. 字母分词器：letter tokenizer

字母分词器（letter tokenizer）根据非字母的符号，将文本切分成分词。

POST _analyze
{
  "tokenizer": "letter",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

4. 小写分词器：lowercase tokenizer

小写分词器（lowercase tokenizer）结合了常规的字母分词器和小写分词过滤器（跟你想的一样，就是将所有的分词转化为小写）的行为。通过一个单独的分词器来实现的主要原因是，一次进行两项操作会获得更好的性能。

POST _analyze
{
  "tokenizer": "lowercase",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "that",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

5. 空白分词器：whitespace tokenizer

空白分词器（whitespace tokenizer）通过空白来分隔不同的分词，空白包括空格、制表符、换行等。但是，我们需要注意的是，空白分词器不会删除任何标点符号。

POST _analyze
{
  "tokenizer": "whitespace",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be,",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "————",
      "start_offset" : 40,
      "end_offset" : 44,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 11
    }
  ]
}

6. 模式分词器：pattern tokenizer

模式分词器（pattern tokenizer）允许指定一个任意的模式，将文本切分为分词。

POST pattern_test2/_analyze
{
  "tokenizer": "my_tokenizer",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

PUT pattern_test2
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
          "tokenizer":"my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer":{
          "type":"pattern",
          "pattern":","
        }
      }
    }
  }
}

{
  "tokens" : [
    {
      "token" : "To be or not to be",
      "start_offset" : 0,
      "end_offset" : 18,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "  That is a question ———— 莎士比亚",
      "start_offset" : 19,
      "end_offset" : 49,
      "type" : "word",
      "position" : 1
    }
  ]
}

7. UAX URL电子邮件分词器：UAX RUL email tokenizer

POST _analyze
{
  "tokenizer": "uax_url_email",
  "text":"作者：张开来源：未知原文：https://www.cnblogs.com/Neeo/articles/10402742.html邮箱：xxxxxxx@xx.com版权声明：本文为博主原创文章，转载请附上博文链接！"
}

{
  "tokens" : [
    {
      "token" : "作",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "者",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "张",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "开",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "来",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "源",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "未",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "知",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "原",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    },
    {
      "token" : "文",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<IDEOGRAPHIC>",
      "position" : 9
    },
    {
      "token" : "https://www.cnblogs.com/Neeo/articles/10402742.html",
      "start_offset" : 13,
      "end_offset" : 64,
      "type" : "<URL>",
      "position" : 10
    },
    {
      "token" : "邮",
      "start_offset" : 64,
      "end_offset" : 65,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "箱",
      "start_offset" : 65,
      "end_offset" : 66,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "xxxxxxx@xx.com",
      "start_offset" : 67,
      "end_offset" : 81,
      "type" : "<EMAIL>",
      "position" : 13
    },
    {
      "token" : "版",
      "start_offset" : 81,
      "end_offset" : 82,
      "type" : "<IDEOGRAPHIC>",
      "position" : 14
    },
    {
      "token" : "权",
      "start_offset" : 82,
      "end_offset" : 83,
      "type" : "<IDEOGRAPHIC>",
      "position" : 15
    },
    {
      "token" : "声",
      "start_offset" : 83,
      "end_offset" : 84,
      "type" : "<IDEOGRAPHIC>",
      "position" : 16
    },
    {
      "token" : "明",
      "start_offset" : 84,
      "end_offset" : 85,
      "type" : "<IDEOGRAPHIC>",
      "position" : 17
    },
    {
      "token" : "本",
      "start_offset" : 86,
      "end_offset" : 87,
      "type" : "<IDEOGRAPHIC>",
      "position" : 18
    },
    {
      "token" : "文",
      "start_offset" : 87,
      "end_offset" : 88,
      "type" : "<IDEOGRAPHIC>",
      "position" : 19
    },
    {
      "token" : "为",
      "start_offset" : 88,
      "end_offset" : 89,
      "type" : "<IDEOGRAPHIC>",
      "position" : 20
    },
    {
      "token" : "博",
      "start_offset" : 89,
      "end_offset" : 90,
      "type" : "<IDEOGRAPHIC>",
      "position" : 21
    },
    {
      "token" : "主",
      "start_offset" : 90,
      "end_offset" : 91,
      "type" : "<IDEOGRAPHIC>",
      "position" : 22
    },
    {
      "token" : "原",
      "start_offset" : 91,
      "end_offset" : 92,
      "type" : "<IDEOGRAPHIC>",
      "position" : 23
    },
    {
      "token" : "创",
      "start_offset" : 92,
      "end_offset" : 93,
      "type" : "<IDEOGRAPHIC>",
      "position" : 24
    },
    {
      "token" : "文",
      "start_offset" : 93,
      "end_offset" : 94,
      "type" : "<IDEOGRAPHIC>",
      "position" : 25
    },
    {
      "token" : "章",
      "start_offset" : 94,
      "end_offset" : 95,
      "type" : "<IDEOGRAPHIC>",
      "position" : 26
    },
    {
      "token" : "转",
      "start_offset" : 96,
      "end_offset" : 97,
      "type" : "<IDEOGRAPHIC>",
      "position" : 27
    },
    {
      "token" : "载",
      "start_offset" : 97,
      "end_offset" : 98,
      "type" : "<IDEOGRAPHIC>",
      "position" : 28
    },
    {
      "token" : "请",
      "start_offset" : 98,
      "end_offset" : 99,
      "type" : "<IDEOGRAPHIC>",
      "position" : 29
    },
    {
      "token" : "附",
      "start_offset" : 99,
      "end_offset" : 100,
      "type" : "<IDEOGRAPHIC>",
      "position" : 30
    },
    {
      "token" : "上",
      "start_offset" : 100,
      "end_offset" : 101,
      "type" : "<IDEOGRAPHIC>",
      "position" : 31
    },
    {
      "token" : "博",
      "start_offset" : 101,
      "end_offset" : 102,
      "type" : "<IDEOGRAPHIC>",
      "position" : 32
    },
    {
      "token" : "文",
      "start_offset" : 102,
      "end_offset" : 103,
      "type" : "<IDEOGRAPHIC>",
      "position" : 33
    },
    {
      "token" : "链",
      "start_offset" : 103,
      "end_offset" : 104,
      "type" : "<IDEOGRAPHIC>",
      "position" : 34
    },
    {
      "token" : "接",
      "start_offset" : 104,
      "end_offset" : 105,
      "type" : "<IDEOGRAPHIC>",
      "position" : 35
    }
  ]
}

8. 路径层次分词器：path hierarchy tokenizer

路径层次分词器（path hierarchy tokenizer）允许以特定的方式索引文件系统的路径，这样在搜索时，共享同样路径的文件将被作为结果返回。

POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text":"/usr/local/python/python2.7"
}

{
  "tokens" : [
    {
      "token" : "/usr",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/local",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/local/python",
      "start_offset" : 0,
      "end_offset" : 17,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/local/python/python2.7",
      "start_offset" : 0,
      "end_offset" : 27,
      "type" : "word",
      "position" : 0
    }
  ]
}

13.4 分词过滤器

1. 自定义分词过滤器

PUT pattern_test3
{
  "settings": {
    "analysis": {
      "filter": {
        "my_test_length":{
          "type":"length",
          "max":8,
          "min":2
        }
      }
    }
  }
}

POST pattern_test3/_analyze
{
  "tokenizer": "standard",
  "filter": ["my_test_length"],
  "text":"a Small word and a longerword"
}

//结果如下：
{
  "tokens" : [
    {
      "token" : "Small",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "word",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "and",
      "start_offset" : 13,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

2. 自定义小写分词过滤器

PUT lowercase_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard_lowercase_example": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        },
        "greek_lowercase_example": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["greek_lowercase"]
        }
      },
      "filter": {
        "greek_lowercase": {
          "type": "lowercase",
          "language": "greek"
        }
      }
    }
  }
}

POST lowercase_example/_analyze
{
  "tokenizer": "standard",
  "filter": ["greek_lowercase"],
  "text":"Ένα φίλτρο διακριτικού τύπου πεζά s ομαλοποιεί το κείμενο διακριτικού σε χαμηλότερη θήκη"
}

{
  "tokens" : [
    {
      "token" : "ενα",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "φιλτρο",
      "start_offset" : 4,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "διακριτικου",
      "start_offset" : 11,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "τυπου",
      "start_offset" : 23,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "πεζα",
      "start_offset" : 29,
      "end_offset" : 33,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "s",
      "start_offset" : 34,
      "end_offset" : 35,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "ομαλοποιει",
      "start_offset" : 36,
      "end_offset" : 46,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "το",
      "start_offset" : 47,
      "end_offset" : 49,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "κειμενο",
      "start_offset" : 50,
      "end_offset" : 57,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "διακριτικου",
      "start_offset" : 58,
      "end_offset" : 69,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "σε",
      "start_offset" : 70,
      "end_offset" : 72,
      "type" : "<ALPHANUM>",
      "position" : 10
    },
    {
      "token" : "χαμηλοτερη",
      "start_offset" : 73,
      "end_offset" : 83,
      "type" : "<ALPHANUM>",
      "position" : 11
    },
    {
      "token" : "θηκη",
      "start_offset" : 84,
      "end_offset" : 88,
      "type" : "<ALPHANUM>",
      "position" : 12
    }
  ]
}

3. 多个分词过滤器

POST _analyze
{
  "tokenizer": "standard",
  "filter": ["length","lowercase"],
  "text":"a Small word and a longerword"
}

{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "small",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "word",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "and",
      "start_offset" : 13,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "a",
      "start_offset" : 17,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "longerword",
      "start_offset" : 19,
      "end_offset" : 29,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

13.5 IK分词器

1. 下载

1.打开Github官网，搜索elasticsearch-analysis-ik，单击medcl/elasticsearch-analysis-ik

https://github.com/medcl/elasticsearch-analysis-ik.

在这里插入图片描述

2.ik版本要和es的版本匹配

在这里插入图片描述

3.在es的安装目录，找到plugins，并新建ik子目录，将ik解压后放入此目录

4.重启es和kibana

2. 介绍

Name	Function
IKAnalyzer.cfg.xml	用来配置自定义的词库
main.dic	ik原生内置的中文词库，大约有27万多条，只要是这些单词，都会被分在一起。
surname.dic	中国的姓氏。
suffix.dic	特殊（后缀）名词，例如乡、江、所、省等等。
preposition.dic	中文介词，例如不、也、了、仍等等。
stopword.dic	英文停用词库，例如a、an、and、the等。
quantifier.dic	单位名词，如厘米、件、倍、像素等。

3. 测试

分解

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "上海自来水来自海上"
}

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "今天是个好日子"
}

查询

GET ik1/_search
{
  "query": {
    "match_phrase": {
      "content": "今天"
    }
  }
}

GET ik1/_search
{
  "query": {
    "match_phrase_prefix": {
      "content": {
        "query": "今天好日子",
        "slop": 2
      }
    }
  }
}

14.正排索引和倒排索引

在这里插入图片描述

15.数据建模

16.集群的内部安全通信

加密数据

避免数据抓包，敏感信息泄露
验证身份，避免Impostor Node
Data/Cluster state

为节点创建证书
TLS协议要求Trusted Certificate Authority(CA)签发的X.509证书

Certificate
节点加入需要使用相同的CA签发的证书
Full Verification
节点加入集群需要相同CA签发的证书，还需要验证Host name 或者IP地址
No Verification
任何节点都可以加入，开发环境用于诊断目的

#生成证书

#为您的Elasticearch集群创建一个证书颁发机构。例如，使用elasticsearch-certutil ca命令：
bin/elasticsearch-certutil ca

#为群集中的每个节点生成证书和私钥。例如，使用elasticsearch-certutil cert 命令：
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12

#将证书拷贝到 config/certs目录下
elastic-certificates.p12

bin/elasticsearch -E node.name=node0 -E cluster.name=es -E path.data=node0_data -E http.port=9200 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate -E xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12 -E xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12

bin/elasticsearch -E node.name=node1 -E cluster.name=es -E path.data=node1_data -E http.port=9201 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate -E xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12 -E xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12

#不提供证书的节点，无法加入
bin/elasticsearch -E node.name=node2 -E cluster.name=es -E path.data=node2_data -E http.port=9202 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate

elasticsearch.yml 配置

#xpack.security.transport.ssl.enabled: true
#xpack.security.transport.ssl.verification_mode: certificate

#xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
#xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12