Elasticsearch字段类型与配置参数完全指南：从基础类型到复杂场景的精准选择(收藏篇)

Solomon_肖哥弹架构

于 2025-10-26 09:00:00 发布

阅读量1.3k

点赞数 26

CC 4.0 BY-SA版权

分类专栏： Elasticsearch 文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/alises1314/article/details/153895603

Elasticsearch 专栏收录该内容

6 篇文章

订阅专栏

#程序员成长：技术、职场与思维模式实战指南#

在这里插入图片描述

肖哥弹架构 跟大家“弹弹” Elasticsearch设计与实战应用，需要代码关注

欢迎关注，点赞，留言。

关注公号Solomon肖哥弹架构获取更多精彩内容

历史热点文章

⚠️ 原创不易搬运必究

本指南系统梳理Elasticsearch所有字段类型，从基础文本、数值到复杂的地理位置、嵌套对象。提供电商、日志、用户画像等实战案例的完整映射配置，涵盖性能参数调优和分析器自定义。包含字段选择决策树、速查表和常见问题解答，帮助开发者精准设计高性能索引，解决实际业务中的数据类型选择难题。

这本Elasticsearch字段类型完全指南是你设计高性能索引的必备参考！

🔥 核心价值：

🎯 系统化分类：涵盖核心类型、复杂类型、特殊类型三大类别，20+种字段类型详解
📊 实战案例：电商搜索、日志分析、用户画像三大典型场景的完整映射配置
⚡ 性能优化：字段参数调优、索引设置配置、分析器自定义的完整方案
🛠️ 决策工具：字段选择决策树、速查表、FAQ，快速解决设计难题

1. 核心数据类型

1.1 字符串类型

text - 全文本类型

用途: 用于全文搜索的字段，会被分词器分析

特点:

支持全文搜索
会被分词处理
默认不支持聚合和排序
支持相关性评分

示例:

{
  "product_name": {
    "type": "text",
    "analyzer": "ik_max_word",
    "search_analyzer": "ik_smart"
  }
}

典型应用场景:

商品名称、描述
文章内容
用户评论
新闻正文

keyword - 关键词类型

用途: 存储精确值，不分词

特点:

不分词，完整存储
支持精确匹配
支持聚合、排序
适合结构化数据

示例:

{
  "user_id": {
    "type": "keyword",
    "ignore_above": 256
  }
}

典型应用场景:

ID、编号
状态码、标签
邮箱、手机号
分类、品牌

重要参数:

ignore_above: 超过该长度的字符串不会被索引（默认 256）

1.2 数值类型

整数类型

类型	范围	存储空间	说明
byte	-128 ~ 127	1 字节	极小整数
short	-32,768 ~ 32,767	2 字节	短整数
integer	-2³¹ ~ 2³¹-1	4 字节	标准整数（常用）
long	-2⁶³ ~ 2⁶³-1	8 字节	长整数

示例:

{
  "age": {
    "type": "byte"
  },
  "stock": {
    "type": "integer"
  },
  "view_count": {
    "type": "long"
  }
}

浮点类型

类型	精度	存储空间	说明
half_float	16位	2 字节	半精度，适合非精确场景
float	32位	4 字节	单精度浮点
double	64位	8 字节	双精度浮点
scaled_float	可缩放	8 字节	使用缩放因子存储

示例:

{
  "rating": {
    "type": "half_float",
    "doc_values": true
  },
  "price": {
    "type": "scaled_float",
    "scaling_factor": 100
  }
}

scaled_float 说明:

scaling_factor: 缩放因子
存储方式: 实际值 × scaling_factor 后存储为长整型
优势: 节省空间，适合价格、金额等固定小数位场景
示例: price = 19.99，scaling_factor = 100，存储为 1999

1.3 日期类型

date - 日期时间类型

支持的格式:

格式化日期字符串: “2025-10-18”, “2025-10-18 10:30:00”
时间戳（毫秒）: 1729238400000
时间戳（秒）: 1729238400

示例:

{
  "created_at": {
    "type": "date",
    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
  }
}

format 说明:

yyyy-MM-dd HH:mm:ss: 标准日期时间格式
yyyy-MM-dd: 仅日期格式
epoch_millis: 毫秒级时间戳
epoch_second: 秒级时间戳
||: 多种格式分隔符，支持任意一种

典型应用:

// 写入示例
{
  "created_at": "2025-10-18 10:30:00"
}
{
  "created_at": 1729238400000
}

// 查询示例
{
  "range": {
    "created_at": {
      "gte": "2025-01-01",
      "lte": "2025-12-31"
    }
  }
}

1.4 布尔类型

boolean - 布尔类型

接受的值:

true: true, "true"
false: false, "false", "" (空字符串)

示例:

{
  "is_active": {
    "type": "boolean"
  },
  "is_vip": {
    "type": "boolean"
  }
}

1.5 二进制类型

binary - 二进制类型

用途: 存储 Base64 编码的二进制数据

特点:

不被搜索
不被索引
仅用于存储

示例:

{
  "file_content": {
    "type": "binary"
  }
}

1.6 范围类型

Range 数据类型

类型	范围描述
integer_range	整数范围
long_range	长整数范围
float_range	浮点数范围
double_range	双精度范围
date_range	日期范围
ip_range	IP 地址范围

示例:

{
  "price_range": {
    "type": "integer_range"
  },
  "event_time_range": {
    "type": "date_range",
    "format": "yyyy-MM-dd HH:mm:ss"
  }
}

使用示例:

// 写入
{
  "price_range": {
    "gte": 100,
    "lte": 500
  }
}

// 查询
{
  "range": {
    "price_range": {
      "gte": 200,
      "lte": 300,
      "relation": "within"
    }
  }
}

relation 参数:

within: 完全包含
contains: 包含查询范围
intersects: 有交集（默认）

2. 复杂数据类型

2.1 object - 对象类型

用途: 存储 JSON 对象

特点:

默认类型
内部字段会被扁平化处理
不保留对象独立性

示例:

{
  "user": {
    "type": "object",
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      }
    }
  }
}

内部存储结构 (扁平化):

// 原始文档
{
  "user": {
    "name": "张三",
    "age": 25
  }
}

// 实际存储为
{
  "user.name": "张三",
  "user.age": 25
}

2.2 nested - 嵌套类型

用途: 存储对象数组，保持对象独立性

为什么需要 nested:

// 使用 object 的问题示例
{
  "comments": [
    {"author": "张三", "rating": 5},
    {"author": "李四", "rating": 2}
  ]
}

// 扁平化后
{
  "comments.author": ["张三", "李四"],
  "comments.rating": [5, 2]
}

// 问题：无法查询 "张三评分为5" 的文档，因为失去了关联性

nested 解决方案:

{
  "comments": {
    "type": "nested",
    "properties": {
      "author": {
        "type": "keyword"
      },
      "rating": {
        "type": "integer"
      },
      "content": {
        "type": "text"
      }
    }
  }
}

nested 查询示例:

{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            {"term": {"comments.author": "张三"}},
            {"term": {"comments.rating": 5}}
          ]
        }
      }
    }
  }
}

性能注意事项:

nested 文档会独立索引，占用更多资源
建议单个文档的 nested 对象不超过 50 个
使用 index.mapping.nested_objects.limit 限制数量（默认 10000）

2.3 flattened - 扁平化对象类型

用途: 将整个 JSON 对象作为单个字段索引

特点:

适合字段结构不固定的场景
所有子字段都作为 keyword 处理
不支持独立查询子字段类型

示例:

{
  "metadata": {
    "type": "flattened"
  }
}

使用场景:

// 写入
{
  "metadata": {
    "color": "red",
    "size": "large",
    "custom_field": "value"
  }
}

// 查询
{
  "term": {
    "metadata.color": "red"
  }
}

对比 object vs nested vs flattened:

特性	object	nested	flattened
保持对象关联性	❌	✅	❌
动态字段支持	✅	✅	✅
支持复杂查询	✅	✅	⚠️ 有限
性能开销	低	高	低
字段类型支持	全部	全部	仅 keyword
适用场景	简单对象	对象数组	动态元数据

2.4 join - 父子关系类型

用途: 在同一索引中建立父子文档关系

示例:

{
  "mappings": {
    "properties": {
      "relation": {
        "type": "join",
        "relations": {
          "product": "review"
        }
      }
    }
  }
}

写入父文档:

{
  "product_id": "P001",
  "product_name": "iPhone 15",
  "relation": {
    "name": "product"
  }
}

写入子文档:

{
  "review_id": "R001",
  "content": "很好用",
  "rating": 5,
  "relation": {
    "name": "review",
    "parent": "P001"
  }
}

查询示例:

// 查询父文档
{
  "has_child": {
    "type": "review",
    "query": {
      "range": {
        "rating": {"gte": 4}
      }
    }
  }
}

// 查询子文档
{
  "has_parent": {
    "parent_type": "product",
    "query": {
      "term": {
        "product_name": "iPhone"
      }
    }
  }
}

使用限制:

父子文档必须在同一分片
写入子文档时必须指定 routing
性能开销较大，建议优先考虑数据冗余方案

3. 特殊数据类型

3.1 ip - IP 地址类型

支持格式:

IPv4: 192.168.1.1
IPv6: 2001:db8::1

示例:

{
  "client_ip": {
    "type": "ip"
  }
}

查询示例:

{
  "term": {
    "client_ip": "192.168.1.1"
  }
}

// CIDR 查询
{
  "term": {
    "client_ip": "192.168.0.0/16"
  }
}

3.2 地理位置类型

geo_point - 地理坐标点

用途: 存储经纬度坐标

支持格式:

// 格式1: 对象
{"lat": 40.12, "lon": -71.34}

// 格式2: 字符串
"40.12,-71.34"

// 格式3: GeoHash
"drm3btev3e86"

// 格式4: 数组 [lon, lat]
[-71.34, 40.12]

示例:

{
  "location": {
    "type": "geo_point"
  }
}

查询示例:

// 距离查询
{
  "geo_distance": {
    "distance": "10km",
    "location": {
      "lat": 40.12,
      "lon": -71.34
    }
  }
}

// 边界框查询
{
  "geo_bounding_box": {
    "location": {
      "top_left": {
        "lat": 40.73,
        "lon": -74.1
      },
      "bottom_right": {
        "lat": 40.01,
        "lon": -71.12
      }
    }
  }
}

geo_shape - 地理形状

用途: 存储复杂地理形状（多边形、线等）

支持类型:

Point（点）
LineString（线）
Polygon（多边形）
MultiPoint（多点）
MultiLineString（多线）
MultiPolygon（多多边形）
GeometryCollection（几何集合）

示例:

{
  "area": {
    "type": "geo_shape"
  }
}

写入示例:

{
  "area": {
    "type": "polygon",
    "coordinates": [
      [
        [100.0, 0.0],
        [101.0, 0.0],
        [101.0, 1.0],
        [100.0, 1.0],
        [100.0, 0.0]
      ]
    ]
  }
}

3.3 completion - 自动补全类型

用途: 实现搜索建议、自动补全功能

特点:

优化了前缀搜索性能
支持权重评分
支持上下文感知

示例:

{
  "suggest": {
    "type": "completion",
    "analyzer": "simple",
    "search_analyzer": "simple",
    "max_input_length": 50
  }
}

写入示例:

{
  "suggest": {
    "input": ["iPhone 15", "iPhone 15 Pro", "苹果手机"],
    "weight": 10
  }
}

查询示例:

{
  "suggest": {
    "product-suggest": {
      "prefix": "iph",
      "completion": {
        "field": "suggest",
        "size": 5,
        "skip_duplicates": true
      }
    }
  }
}

3.4 search_as_you_type - 即时搜索类型

用途: 优化"边输入边搜索"场景

特点:

自动创建多个子字段
支持部分匹配

示例:

{
  "title": {
    "type": "search_as_you_type",
    "max_shingle_size": 3
  }
}

自动生成的子字段:

title: 标准分析
title._2gram: 2-gram 分析
title._3gram: 3-gram 分析
title._index_prefix: 前缀分析

3.5 token_count - 词元计数类型

用途: 存储字段分词后的词元数量

示例:

{
  "content": {
    "type": "text"
  },
  "word_count": {
    "type": "token_count",
    "analyzer": "standard"
  }
}

使用场景: 按文章字数过滤、排序

3.6 percolator - 反向查询类型

用途: 存储查询条件，用于匹配文档

应用场景:

价格监控（当商品降价时通知用户）
内容推送（当有符合兴趣的内容时推送）

示例:

{
  "query": {
    "type": "percolator"
  }
}

4. 字段通用参数

4.1 type - 字段类型

说明: 指定字段的数据类型

示例:

{
  "age": {
    "type": "integer"
  }
}

4.2 index - 是否索引

可选值:

true: 索引该字段（默认）
false: 不索引，无法搜索

说明:

设为 false 的字段仍然可以存储和返回
适用于不需要搜索的字段（如图片URL）

示例:

{
  "image_url": {
    "type": "keyword",
    "index": false
  }
}

性能影响: 不索引可节省磁盘空间和索引时间

4.3 store - 是否单独存储

可选值:

true: 单独存储字段值
false: 不单独存储（默认）

说明:

默认情况下，字段值存储在 _source 中
单独存储可以在不加载 _source 的情况下检索
会增加磁盘占用

示例:

{
  "title": {
    "type": "text",
    "store": true
  }
}

使用场景: 当 _source 很大，只需要部分字段时

4.4 doc_values - 列式存储

可选值:

true: 启用列式存储（默认，除了 text）
false: 禁用列式存储

说明:

用于聚合、排序、脚本访问
存储在磁盘上，JVM 堆外内存访问
text 类型默认不支持

示例:

{
  "price": {
    "type": "long",
    "doc_values": true
  }
}

禁用场景: 不需要聚合和排序的字段，节省磁盘空间

4.5 enabled - 是否解析字段

可选值:

true: 解析字段（默认）
false: 不解析，仅存储在 _source

示例:

{
  "raw_data": {
    "type": "object",
    "enabled": false
  }
}

使用场景:

需要存储但不需要搜索的复杂对象
动态结构的原始数据

4.6 null_value - 空值替换

说明: 当字段为 null 时，使用指定值替换

示例:

{
  "status": {
    "type": "keyword",
    "null_value": "unknown"
  }
}

注意:

仅影响索引，不影响 _source 存储
必须与字段类型匹配

4.7 copy_to - 字段复制

说明: 将字段值复制到目标字段

示例:

{
  "first_name": {
    "type": "text",
    "copy_to": "full_name"
  },
  "last_name": {
    "type": "text",
    "copy_to": "full_name"
  },
  "full_name": {
    "type": "text"
  }
}

使用场景:

多字段组合搜索
不影响 _source，目标字段不会出现在返回结果中

4.8 ignore_above - 字符串长度限制

说明: keyword 类型超过指定长度的字符串不会被索引

默认值: 256

示例:

{
  "tag": {
    "type": "keyword",
    "ignore_above": 128
  }
}

应用: 防止异常长字符串被索引

4.9 ignore_malformed - 忽略格式错误

说明: 忽略格式错误的数据，不中断索引

示例:

{
  "age": {
    "type": "integer",
    "ignore_malformed": true
  }
}

行为:

格式错误的值不会被索引
文档仍然会被索引
_source 中保留原始值

4.10 coerce - 数据类型转换

说明: 尝试将数据转换为字段类型

默认值: true

示例:

{
  "price": {
    "type": "float",
    "coerce": true
  }
}

转换示例:

"10" → 10 (字符串转数值)
10.5 → 10 (浮点转整数，会截断)

4.11 dynamic - 动态映射

可选值:

true: 自动添加新字段（默认）
false: 忽略新字段，但存储在 _source
strict: 拒绝新字段，抛出异常

示例:

{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "name": {"type": "text"}
    }
  }
}

4.12 fields - 多字段映射

说明: 为同一字段定义多种索引方式

示例:

{
  "product_name": {
    "type": "text",
    "analyzer": "ik_max_word",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      },
      "pinyin": {
        "type": "text",
        "analyzer": "pinyin"
      }
    }
  }
}

访问方式:

product_name: 全文搜索
product_name.keyword: 精确匹配、聚合、排序
product_name.pinyin: 拼音搜索

5. Text 类型专属参数

5.1 analyzer - 索引分析器

说明: 指定索引时使用的分析器

示例:

{
  "content": {
    "type": "text",
    "analyzer": "ik_max_word"
  }
}

常用分析器:

standard: 标准分析器（英文）
simple: 简单分析器
whitespace: 空格分析器
ik_max_word: IK 最细粒度（中文）
ik_smart: IK 智能分词（中文）

5.2 search_analyzer - 搜索分析器

说明: 指定搜索时使用的分析器

示例:

{
  "content": {
    "type": "text",
    "analyzer": "ik_max_word",
    "search_analyzer": "ik_smart"
  }
}

使用场景:

索引时使用细粒度分词（提高召回率）
搜索时使用粗粒度分词（提高精确度）

5.3 norms - 评分标准化因子

说明: 是否存储用于计算评分的标准化因子

默认值: true

示例:

{
  "description": {
    "type": "text",
    "norms": false
  }
}

禁用场景:

不需要相关性评分的字段
节省磁盘空间（约 1 字节/文档/字段）

5.4 term_vector - 词向量

可选值:

no: 不存储（默认）
yes: 存储词项
with_positions: 存储词项和位置
with_offsets: 存储词项和偏移量
with_positions_offsets: 全部存储

示例:

{
  "content": {
    "type": "text",
    "term_vector": "with_positions_offsets"
  }
}

使用场景:

高亮显示
更快的 percolate 查询
MLT（More Like This）查询

5.5 position_increment_gap - 位置间隔

说明: 多值字段之间的位置间隔

默认值: 100

示例:

{
  "tags": {
    "type": "text",
    "position_increment_gap": 100
  }
}

用途: 防止跨值的短语查询匹配

5.6 index_options - 索引选项

可选值:

docs: 仅索引文档号
freqs: 索引文档号和词频
positions: 索引文档号、词频、位置（默认）
offsets: 全部索引

示例:

{
  "content": {
    "type": "text",
    "index_options": "offsets"
  }
}

选择建议:

仅过滤: docs
评分: freqs
短语查询: positions
高亮: offsets

5.7 index_prefixes - 前缀索引

说明: 优化前缀查询性能

示例:

{
  "title": {
    "type": "text",
    "index_prefixes": {
      "min_chars": 2,
      "max_chars": 5
    }
  }
}

参数:

min_chars: 最小前缀长度（默认 2）
max_chars: 最大前缀长度（默认 5）

5.8 index_phrases - 短语索引

说明: 优化短语查询性能

示例:

{
  "content": {
    "type": "text",
    "index_phrases": true
  }
}

用途: 加速 match_phrase 查询

6. 索引设置参数

6.1 number_of_shards - 主分片数

说明: 索引的主分片数量

默认值: 1

示例:

{
  "settings": {
    "number_of_shards": 5
  }
}

重要:

创建后不可更改
需要根据数据量规划

6.2 number_of_replicas - 副本数

说明: 每个主分片的副本数量

默认值: 1

示例:

{
  "settings": {
    "number_of_replicas": 2
  }
}

重要:

可以动态修改
影响高可用性和读取性能

6.3 refresh_interval - 刷新间隔

说明: 多久将内存缓冲区的数据刷新到磁盘

默认值: 1s

示例:

{
  "settings": {
    "refresh_interval": "30s"
  }
}

优化建议:

实时搜索: 1s
批量导入: -1 (禁用)
一般场景: 30s

6.4 max_result_window - 最大结果窗口

说明: from + size 的最大值

默认值: 10000

示例:

{
  "settings": {
    "max_result_window": 100000
  }
}

警告: 过大会导致内存问题，建议使用 search_after

6.5 translog - 事务日志配置

说明: 控制事务日志的持久化行为

示例:

{
  "settings": {
    "translog": {
      "durability": "async",
      "sync_interval": "30s",
      "flush_threshold_size": "1gb"
    }
  }
}

参数说明:

durability:
- request: 每次请求都 fsync（默认，安全）
- async: 异步 fsync（高性能）
sync_interval: 异步刷新间隔（默认 5s）
flush_threshold_size: 刷新阈值（默认 512mb）

6.6 max_ngram_diff / max_shingle_diff - N-gram 差异限制

说明: 允许的最大 n-gram/shingle 差异

默认值: 1

示例:

{
  "settings": {
    "max_ngram_diff": 5,
    "max_shingle_diff": 3
  }
}

6.7 index.mapping.nested_objects.limit - 嵌套对象限制

说明: 索引中 nested 对象的最大数量

默认值: 10000

示例:

{
  "settings": {
    "index.mapping.nested_objects.limit": 50000
  }
}

6.8 index.mapping.depth.limit - 字段深度限制

说明: 字段的最大嵌套深度

默认值: 20

示例:

{
  "settings": {
    "index.mapping.depth.limit": 30
  }
}

6.9 index.mapping.total_fields.limit - 字段数量限制

说明: 索引中字段的最大数量

默认值: 1000

示例:

{
  "settings": {
    "index.mapping.total_fields.limit": 2000
  }
}

6.10 routing - 路由配置

说明: 控制文档路由到哪个分片

示例:

{
  "settings": {
    "routing": {
      "allocation": {
        "require": {
          "box_type": "hot"
        }
      }
    }
  }
}

allocation 参数:

require: 必须满足的条件
include: 可以满足的条件
exclude: 必须排除的条件

7. 分析器配置

7.1 分析器组成

分析器包含三部分:

Character Filter: 字符过滤器（预处理）
Tokenizer: 分词器（切分）
Token Filter: 词元过滤器（后处理）

7.2 自定义分析器示例

{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "& => and",
            "| => or"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": ","
        }
      },
      "filter": {
        "my_stop_filter": {
          "type": "stop",
          "stopwords": ["的", "了", "是"]
        },
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "手机, 移动电话",
            "电脑, 计算机"
          ]
        }
      },
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": ["my_char_filter"],
          "tokenizer": "my_tokenizer",
          "filter": ["lowercase", "my_stop_filter", "my_synonym_filter"]
        }
      }
    }
  }
}

7.3 常用 Token Filter

Filter	说明	示例配置
lowercase	转小写	`{"type": "lowercase"}`
uppercase	转大写	`{"type": "uppercase"}`
stop	停用词	`{"type": "stop", "stopwords": "_english_"}`
synonym	同义词	`{"type": "synonym", "synonyms_path": "synonyms.txt"}`
stemmer	词干提取	`{"type": "stemmer", "language": "english"}`
edge_ngram	边缘 N-gram	`{"type": "edge_ngram", "min_gram": 2, "max_gram": 10}`
ngram	N-gram	`{"type": "ngram", "min_gram": 2, "max_gram": 3}`
length	长度过滤	`{"type": "length", "min": 2, "max": 20}`
unique	去重	`{"type": "unique"}`

7.4 中文分析器配置（IK）

{
  "settings": {
    "analysis": {
      "analyzer": {
        "ik_smart_pinyin": {
          "type": "custom",
          "tokenizer": "ik_smart",
          "filter": ["lowercase", "my_pinyin"]
        },
        "ik_max_word_pinyin": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": ["lowercase", "my_pinyin"]
        }
      },
      "filter": {
        "my_pinyin": {
          "type": "pinyin",
          "keep_first_letter": true,
          "keep_separate_first_letter": false,
          "keep_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "lowercase": true,
          "remove_duplicated_term": true
        }
      }
    }
  }
}

8. 实战案例详解

8.1 电商搜索场景

{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": ["lowercase", "product_synonym"]
        }
      },
      "filter": {
        "product_synonym": {
          "type": "synonym",
          "synonyms": [
            "手机,mobile,phone",
            "电脑,computer,pc"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "product_id": {
        "type": "keyword",
        "doc_values": true
      },
      "product_name": {
        "type": "text",
        "analyzer": "product_analyzer",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          },
          "suggest": {
            "type": "completion",
            "analyzer": "simple"
          }
        },
        "copy_to": "all_text"
      },
      "category": {
        "type": "keyword",
        "doc_values": true
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100,
        "doc_values": true
      },
      "stock": {
        "type": "integer",
        "doc_values": true
      },
      "sales_count": {
        "type": "long",
        "doc_values": true
      },
      "rating": {
        "type": "half_float",
        "doc_values": true
      },
      "brand": {
        "type": "keyword",
        "doc_values": true
      },
      "tags": {
        "type": "keyword",
        "doc_values": true
      },
      "description": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all_text"
      },
      "all_text": {
        "type": "text",
        "analyzer": "product_analyzer"
      },
      "attributes": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "keyword"
          },
          "value": {
            "type": "keyword"
          }
        }
      },
      "images": {
        "type": "keyword",
        "index": false,
        "doc_values": false
      },
      "status": {
        "type": "keyword",
        "doc_values": true
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
      },
      "updated_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

字段设计说明:

字段	类型	说明	设计考虑
product_id	keyword	商品ID	精确匹配，支持聚合
product_name	text + fields	商品名称	多字段：全文搜索、精确匹配、自动补全
category	keyword	分类	用于聚合和过滤
price	scaled_float	价格	缩放因子100，节省空间
rating	half_float	评分	半精度足够，节省50%空间
attributes	nested	商品属性	保持数组元素独立性
images	keyword, index: false	图片URL	不需要搜索，仅存储返回
all_text	text	组合字段	通过 copy_to 实现跨字段搜索

8.2 日志分析场景

{
  "settings": {
    "number_of_shards": 10,
    "number_of_replicas": 1,
    "refresh_interval": "5s",
    "index.lifecycle.name": "logs-policy",
    "index.lifecycle.rollover_alias": "logs-prod"
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      },
      "log_level": {
        "type": "keyword"
      },
      "service": {
        "type": "keyword"
      },
      "host": {
        "type": "keyword"
      },
      "ip": {
        "type": "ip"
      },
      "message": {
        "type": "text",
        "analyzer": "standard",
        "norms": false
      },
      "stack_trace": {
        "type": "text",
        "index": false
      },
      "request": {
        "properties": {
          "method": {
            "type": "keyword"
          },
          "url": {
            "type": "keyword",
            "ignore_above": 512
          },
          "status_code": {
            "type": "short"
          },
          "response_time": {
            "type": "integer"
          }
        }
      },
      "user": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "name": {
            "type": "keyword"
          }
        }
      },
      "geo": {
        "properties": {
          "country": {
            "type": "keyword"
          },
          "city": {
            "type": "keyword"
          },
          "location": {
            "type": "geo_point"
          }
        }
      },
      "extra": {
        "type": "flattened"
      }
    }
  }
}

字段设计说明:

字段	类型	说明	设计考虑
@timestamp	date	时间戳	使用标准 ECS 字段名
log_level	keyword	日志级别	用于过滤和聚合
message	text, norms: false	日志消息	不需要评分，节省空间
stack_trace	text, index: false	堆栈信息	仅存储，不搜索
request.url	keyword, ignore_above: 512	请求URL	限制长度，防止异常
extra	flattened	额外信息	动态字段，使用扁平化类型

8.3 用户画像场景

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword"
      },
      "profile": {
        "properties": {
          "name": {
            "type": "keyword"
          },
          "age": {
            "type": "byte"
          },
          "gender": {
            "type": "keyword"
          },
          "phone": {
            "type": "keyword",
            "index": false
          },
          "email": {
            "type": "keyword",
            "index": false
          }
        }
      },
      "preferences": {
        "properties": {
          "categories": {
            "type": "keyword"
          },
          "brands": {
            "type": "keyword"
          },
          "price_range": {
            "type": "integer_range"
          }
        }
      },
      "behavior": {
        "properties": {
          "total_orders": {
            "type": "integer"
          },
          "total_amount": {
            "type": "scaled_float",
            "scaling_factor": 100
          },
          "avg_order_value": {
            "type": "scaled_float",
            "scaling_factor": 100
          },
          "last_order_date": {
            "type": "date"
          },
          "favorite_products": {
            "type": "keyword"
          }
        }
      },
      "tags": {
        "type": "keyword"
      },
      "segments": {
        "type": "keyword"
      },
      "location": {
        "type": "geo_point"
      },
      "created_at": {
        "type": "date"
      },
      "updated_at": {
        "type": "date"
      }
    }
  }
}

9. 字段类型选择决策树

需要搜索？
├─ 是
│  ├─ 全文搜索？
│  │  ├─ 是 → text
│  │  └─ 否
│  │     ├─ 精确匹配 → keyword
│  │     ├─ 数值范围 → 数值类型
│  │     ├─ 日期范围 → date
│  │     └─ 地理位置 → geo_point/geo_shape
│  └─ 自动补全 → completion
└─ 否
   ├─ 需要聚合/排序？
   │  ├─ 是 → keyword (doc_values: true)
   │  └─ 否 → keyword (index: false, doc_values: false)
   └─ 仅存储 → enabled: false

10. 性能优化检查清单

10.1 字段级别优化

不需要搜索的字段设置 index: false
不需要聚合/排序的字段设置 doc_values: false
不需要评分的 text 字段设置 norms: false
使用 ignore_above 限制 keyword 长度
数值字段选择合适的精度类型
价格使用 scaled_float 而不是 double
评分使用 half_float 而不是 float
对象数组优先使用 nested 而不是 object
动态字段考虑使用 flattened

10.2 索引级别优化

合理规划分片数量（20-50GB/分片）
批量写入时禁用 refresh（设为 -1）
批量写入时副本设为 0
合理设置 refresh_interval（一般 30s）
启用 ILM 生命周期管理
使用索引模板统一管理
定期 force_merge 优化段
监控分片大小和数量

10.3 查询级别优化

优先使用 filter context
避免深度分页，使用 search_after
限制返回字段（_source filtering）
使用 multi_match 替代多个 match
避免使用 wildcard 查询前缀
合理使用查询缓存
避免过多的聚合嵌套

11. 常见问题 FAQ

Q1: text 和 keyword 如何选择？

text: 需要全文搜索、分词（如商品名称、文章内容）
keyword: 精确匹配、聚合、排序（如 ID、状态、标签）
建议: 使用 multi-fields 同时支持两种场景

Q2: 什么时候使用 nested 类型？

当有对象数组，且需要保持数组元素间的独立性
例如：商品属性、订单明细、评论列表
注意: nested 有性能开销，单文档建议不超过 50 个对象

Q3: doc_values 是什么？

列式存储结构，用于聚合、排序、脚本访问
存储在磁盘，使用操作系统缓存
text 类型不支持 doc_values
不需要聚合/排序时可禁用节省空间

Q4: 分片数量如何规划？

建议单分片 20-50GB
分片数 = 预估数据量 / 目标分片大小
不要过多（增加协调开销）
不要过少（无法利用集群资源）

Q5: 如何优化写入性能？

增大 refresh_interval（如 30s）
批量写入时设置副本为 0
使用 bulk API，批量大小 5-15MB
调整 translog 为异步模式
禁用不必要的字段索引

Q6: ignore_above 和 ignore_malformed 的区别？

ignore_above: keyword 类型，超长字符串不索引但会存储
ignore_malformed: 任何类型，格式错误数据不索引但会存储

Q7: 什么时候需要 store: true？

极少需要
仅当 _source 很大，只需要返回少量字段时
会增加磁盘占用

Q8: 如何实现中文拼音搜索？

{
  "product_name": {
    "type": "text",
    "analyzer": "ik_smart",
    "fields": {
      "pinyin": {
        "type": "text",
        "analyzer": "pinyin_analyzer"
      }
    }
  }
}

12. 快速参考表

12.1 字段类型选择速查表

数据特征	推荐类型	示例
唯一标识	keyword	user_id, order_no
状态枚举	keyword	status, type
标签数组	keyword	tags[], categories[]
商品名称	text + keyword	product_name
描述内容	text	description
整数ID	long	id
数量	integer	quantity
价格	scaled_float	price
评分	half_float	rating
日期时间	date	created_at
开关标志	boolean	is_active
IP地址	ip	client_ip
经纬度	geo_point	location
对象	object	user{}
对象数组	nested	items[]
动态字段	flattened	metadata
自动补全	completion	suggest

12.2 常用参数速查表

参数	适用类型	默认值	说明
type	所有	-	字段类型
index	所有	true	是否索引
store	所有	false	是否单独存储
doc_values	除text	true	是否列存
enabled	object	true	是否解析
analyzer	text	standard	索引分析器
search_analyzer	text	analyzer值	搜索分析器
norms	text	true	评分因子
fields	text/keyword	-	多字段映射
ignore_above	keyword	256	长度限制
ignore_malformed	数值/日期	false	忽略格式错误
coerce	数值	true	类型转换
null_value	所有	null	空值替换
copy_to	所有	-	复制到目标字段
format	date	-	日期格式
scaling_factor	scaled_float	-	缩放因子