mapping的介绍和实战

最新推荐文章于 2025-03-31 09:52:43 发布

置顶 *辰

最新推荐文章于 2025-03-31 09:52:43 发布

阅读量1.2w

点赞数 4

分类专栏：搜索工具-Elasticsearch 文章标签： Elasticsearc analyzer mapping 映射分词器

搜索工具-Elasticsearch 专栏收录该内容

2 篇文章

订阅专栏

引入mapping的概念

在不手动指定mapping的前提下，elasticsearch(以下简称ES)会自动为index创建mapping，

PUT /test_index/test_type/3
{
    "name":"xiao ming",
    "age": 24
}

执行以上命令后，我们查询test_type的_mapping，查询命令GET /test_index/_mapping/test_type，得到的结果如下

{
  "test_index": {
    "mappings": {
      "test_type": {
        "properties": {
          "age": {
            "type": "long"
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

得出结论：ES能够自动识别出name的类型为string，age的类型为long，以上的mapping是ES在创建index时自动创建的。

mapping是什么？

mapping字面上理解就是映射，从上面的介绍我们可以得出mapping是索引字段类型的映射，但mapping不仅仅如此，mapping还可以定义如何索引字段以及字段是否可以被索引到。

ES的mapping提供了对ES中索引字段名及其数据类型的定义，还可以对某些字段添加特殊属性：该字段是否分词，是否存储，使用什么样的分词器等。

字段有几个基本的映射选项，类型(type)和索引方式(index)。以字符串类型为例，index有三个选项：

analyzed：默认选项，以标准的全文索引方式，分析字符串，完成索引。
not_analyzed：精确索引，不对字符串做分析，直接索引字段数据的精确内容。
no：不索引该字段。

剖析mapping

一个mapping由一个或多个analyzer组成，一个analyzer又由一个或多个filter组成的，当ES索引文档的时候，它把字段中的内容传递给相应的analyzer，analyzer再传递给各自的filters。
filter的功能很容易理解：一个filter就是一个转换数据的方法，输入一个字符串，这个方法返回另一个字符串，比如一个将字符串转为小写的方法就是一个filter很好的例子。
总结来说， mapping的作用就是执行一系列的指令将输入的数据转成可搜索的索引项。

关于mapping查询练习

第一步：新建三个document

PUT /baidu/article/1
{
	"post_date":"2018-01-01",
	"title":"the first article",
	"content":"this is my first article in baidu",
	"author_id":11401
}
PUT /baidu/article/2
{
	"post_date":"2018-01-02",
	"title":"the second article",
	"content":"this is my second article in baidu",
	"author_id":11402
}
PUT /baidu/article/3
{
	"post_date":"2018-01-03",
	"title":"the third article",
	"content":"this is my third article in baidu",
	"author_id":11403
}

执行 GET /baidu/article/_search?q=2018会查询三条结果，为什么？

因为q=2018这种写法搜索的是_all field，而_all field对应的是全文检索，所以能检索出三条结果。

执行 GET /baidu/article/_search?q=2018-01-01会查询三条结果，为什么？

因为q=2018-01-01这种写法搜索的是_all field，而_all field对应的是全文检索，所以检索时会拆词，拆成2018，01，01，而2018在新建的三个document中都有，所以三条记录都会返回。

执行 GET /baidu/article/_search?q=post_data:2018-01-01查询一条结果，为什么？

这里涉及到几个重要的知识点：

向ES中插入document时，ES会自动建立索引以及索引对应的mapping，mapping中定义了document中每个字段的类型。
不同的数据类型索引的方式不同，有的是exact value(精确匹配)，有的是full text(全文检索)，而date类型的字段对应的索引方式时exact value，也就是精确匹配。
exact value在建立倒排索引和分词的时候，时将整个值作为一个关键字建立在倒排索引中；而full text会建立各种各样的处理还有分词、nomalization(下面有讲解)，之后才会建立倒排索引中。
一个搜索请求过来的时候，exact value 和 full text的field的搜索行为是不一样的，会跟建立倒排索引的行为保持一致

综上所述：q=post_data:2018-01-01查询到的只有一条记录。

手动创建mapping

只能在创建index时手动建立mapping，或者新增field mapping，但是不能update field mapping。

PUT /baidu
{
  "mappings": {
    "article": {
      "properties": {
        "author_id":{
          "type": "long",
          "index": "not_analyzed"
        },
        "title":{
          "type": "text"
        },
        "content":{
          "type": "text"
        },
        "post_date":{
          "type": "date"
        }
      }
    }
  }
}

==================================创建mapping的语法如上==================================

PUT /baidu
{
  "mappings": {
    "article": {
      "properties": {
        "author_id": {
          "type": "text"
        }
      }
    }
  }
}

===================update field mapping的语法如上，结果会报错，这个请大家自己尝试===================

创建一个新的字段new_field是不会报错的
PUT /baidu/_mapping/article
{
  "properties" : {
    "new_field" : {
      "type" :    "string",
      "index":    "not_analyzed"
    }
  }
}

==========================new field mapping 的语法如上，结果不会报错==========================

-- test exact-value
GET /baidu/article/_search
{
  "query": {
    "match": {
      "author_id": "11402"
    }
  }
}

===================================以上是测试精确匹配==================================

-- test full-text
GET /website/article/_search
{
  "query": {
    "match": {
      "title": "my third article"
    }
  }
}

===================================以上是测试全文检索==================================

============================如果运行有问题欢迎评论留言============================

梳理一下full-text全过程

1. 在创建索引之前，需要将doc中域的值切分为独立的单词(term或token);

2. 然后创建一个无重复的有序的单词列表，这个过程我们称之为分词(tokenization)；

3. 倒排索引中的数据还需要正规化(normalization)，只有这样才能评估与搜索请求字符串的相似度，normalization具体包括：将所有大写字符转换为小写，将复数统一单数，将同义词统一进行索引等;

4. 另外在查询之前会对搜索字符串进行同样的正规化过程。