Elasticsearch的配置

IT小白龙

已于 2022-09-09 17:13:39 修改

阅读量1.7k

点赞数 1

分类专栏： Elasticsearch 文章标签： elasticsearch 大数据搜索引擎全文检索

于 2022-09-09 17:06:15 首次发布

本文链接：https://blog.csdn.net/jy615183000/article/details/126783968

版权

Elasticsearch 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

项目里有一些查询和报表页面，在数据量比较大的情况下，如果每次都调用数据库，会造成服务器压力和不必要的开销，可以对常用且不需要实时的数据使用Elasticsearch存储和查询，提高运行效率。

在这里插入图片描述

分析如下职位详情页面：
在这里插入图片描述

一基本配置：

'indexsetting'=>[
    "settings"=> [
        "index"=> [
            "number_of_shards"=> 3, //主分片
            "number_of_replicas"=> 2, //副本
            "max_result_window"=> 100000,
        ],

二索引：

'index'=>[//不同索引配置
     'job'=>[//别名alias
         'dir'=>'/job',//索引所在es数据目录下的子目录名
         'optimizehi'=>'0230',//es执行优化操作的时间点

三 mapping：

dynamic设置为严格模式，如果发现未知字段则抛出异常。
required设置为true，则强制检查增删改查接口是否加了routing参数，如果没有加，就会报错。这样会减少查询时扫描shard的个数，从而提高查询效率，但是容易造成负载不均衡。

"mappings"=> [
	   "dynamic"=> "strict",
	   "_routing"=> [
	       "required"=> true

四映射：

职位号：integar 数字类型，范围 -2^31 至 2^31-1
职位名：text 全文搜索，自定义分词规则。在text类型的字段上默认禁用Fielddata，因为会占用很大的内存。如果想对text类型进行聚合，可以在对应字段上设置fielddata=true，以便通过取消反转索引将fielddata加载到内存中。
工作地区：keyword 精确值搜索
分库号：short 数字类型，范围-32768 至 32767
其他字段也大多使用上述类型

"properties"=> [ //相当于一个JSON对象
     "jobid"=>[
         "type"=> "integer"
     ],
     "jobname"=> [
         "type"=> "text",
         "analyzer"=> "text_psplit",
         "fielddata"=>true
     ],
     "cjobname"=> [
         "type"=> "text",
         "analyzer"=> "text_psplit",
         "copy_to"=> "jobname" //复制到jobname
     ],
     "ejobname"=> [
         "type"=> "text",
         "analyzer"=> "text_psplit",
         "copy_to"=> "jobname" //复制到jobname
     ],
     "jobarea"=>[
         "type"=> "text",
         "analyzer"=> "text_wsplit",
         "fielddata"=>true
     ],
     "workarea"=>[
         "type"=> "keyword",
         "norms"=> false
     ],
     "operator"=>[
         "type"=> "keyword",
         "normalizer"=> "my_lowercase",
         "norms"=> false
     ],
     "dbid"=>[
         "type"=> "short"
     ],

五分词规则：

text_split：使用标准分词器 thai，自定义过滤规则 text_split_filter
text_wsplit：使用内置分词器 whitespace，在处理文本时以空格字符为区分进行分词，通常用于英文，标签等。
text_psplit：使用自定义分词器 my_tokenizer

"analysis"=> [
	  "analyzer"=> [
	       "text_split"=> [
	           "tokenizer"=> "thai",
	           "filter"=> [
	               "text_split_filter",
	               "lowercase"
	           ]
	       ],
	       "text_wsplit"=> [
	           "tokenizer"=> "whitespace",
	           "filter"=> [
	               "lowercase"
	           ]
	       ],
	       "text_psplit"=> [
	           "tokenizer"=> "my_tokenizer",
	           "filter"=> [
	               "lowercase"
	           ]
	       ],
	   ],

六自定义分词器：

"tokenizer"=> [
    "my_tokenizer"=> [
        "type"=> "pattern", //根据正则表达式的匹配规则来分词
        "pattern"=>"([a-zA-Z]{1,1024}|\\d{1,1024}|[\\u4e00-\\u9fa5])",
        "group"=> 1 //去掉符号
    ]
],

七过滤规则：

type：使用非字母字符切分，移除前置和后置分隔符，在大小写、字母和数字的过度位置做切分。
generate_word_parts：表示是否分割单词
generate_number_parts：表示是否分割数字
catenate_numbers：表示是否将数字部件连接
catenate_words：表示是否将单词部件的大部分被连接
split_on_case_change：表示是否在大、小写转换处分割
catenate_all：表示是否将所有分割得到的子词连接

"filter"=> [
    "text_split_filter"=> [
        "type"=> "word_delimiter_graph",
        "generate_word_parts"=> true,
        "generate_number_parts"=> true,
        "catenate_numbers"=> true,
        "catenate_words"=> true,
        "split_on_case_change"=> false,
        "catenate_all"=> false
    ]
],

八特殊：

针对keyword的大小写查询：

"normalizer"=> [
    "my_lowercase"=> [
        "type"=> "custom", //自定义的分词器
        "filter"=> ["lowercase"]
    ]
],

IT小白龙

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch的配置

项目里有一些查询和报表页面，在数据量比较大的情况下，如果每次都调用数据库，会造成服务器压力和不必要的开销，可以对常用且不需要实时的数据使用Elasticsearch存储和查询，提高运行效率。
复制链接

扫一扫