logstash在向elasticsearch输出数据时的动态映射模板问题

      使用logstash-input-jdbc插件同步mysql数据到elasticsearch,系统会使用一个默认的动态映射模板,模板名字为logstash。在启动logstash过程中你会看到如下信息

Using mapping template from {:path=>nil}

Attempting to install template{:manage_template=>{"template"=>"logstash-*","version"=>50001,"settings"=>{"index.refresh_interval"=>"5s"},"mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true,"norms"=>false},"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message","match_mapping_type"=>"string", "mapping"=>{"type"=>"text","norms"=>false}}},{"string_fields"=>{"match"=>"*","match_mapping_type"=>"string","mapping"=>{"type"=>"text","norms"=>false,"fields"=>{"keyword"=>{"type"=>"keyword"}}}}}],"properties"=>{"@timestamp"=>{"type"=>"date","include_in_all"=>false},"@version"=>{"type"=>"keyword","include_in_all"=>false},"geoip"=>{"dynamic"=>true,"properties"=>{"ip"=>{"type"=>"ip"},"location"=>{"type"=>"geo_point"},"latitude"=>{"type"=>"half_float"},"longitude"=>{"type"=>"half_float"}}}}}}}}

Installing elasticsearch template to_template/logstash

      你看第一行path=>nil表示没有找到自定义模板,那就使用默认模板,并且最后将模板存储在elasticsearch模板路径中,以logstash命名。模板内容:


{

    "template":"logstash-*",

    "version": 50001,

    "settings": {

        "index.refresh_interval":"5s"

    },

    "mappings": {

        "_default_": {

            "_all": {

                "enabled": true,

                "norms": false

            },

            "dynamic_templates": [

                {

                    "message_field":{

                        "path_match":"message",

                       "match_mapping_type": "string",

                        "mapping": {

                            "type":"text",

                            "norms":false

                        }

                    }

                },

                {

                    "string_fields":{

                        "match": "*",

                       "match_mapping_type": "string",

                        "mapping": {

                            "type":"text",

                            "norms":false,

                            "fields":{

                               "keyword": {

                                   "type": "keyword"

                                }

                            }

                        }

                    }

                }

            ],

            "properties": {

                "@timestamp": {

                    "type":"date",

                    "include_in_all":false

                },

                "@version": {

                    "type":"keyword",

                    "include_in_all":false

                },

                "geoip": {

                    "dynamic": true,

                    "properties": {

                        "ip": {

                            "type":"ip"

                        },

                        "location": {

                            "type":"geo_point"

                        },

                        "latitude": {

                            "type":"half_float"

                        },

                        "longitude":{

                            "type":"half_float"

                        }

                    }

                }

            }

        }

    }

}

       他会帮我们自动映射同步过来的字段,但是有一个不好的地方是大部分text类型都分词,而我自己的需求更多是不分词,所以要自定义映射;刚开始我没意识到模板的优先级,我是没改模板配置,一切都是默认,只不过在启动logstash之前,我先用curl  -XPUT命令在es集群上创建了不分词的映射,但是发现同步完数据后并没生效,这才意识到logstash的output插件优先级高于你在集群上创建的映射。所以接下来修改模板并覆盖默认的。

   首先用命令删除默认的模板:

curl –XDELETE–u elastic ‘192.168.11.31:8011/_template/logstash’

然后新建一个文件es-template.json,名字随便起,在默认模版的内容上修改一下,粘进去,我这里将text类型全部不分词,一下是模板内容

{

  "template": "my_index",

  "settings" : {

    "index.refresh_interval" :"5s"

  },

  "mappings" : { 

    "_default_" : { 

      "_all" : {"enabled":false, "omit_norms" : true}, 

      "dynamic_templates" : [ { 

        "message_field" : { 

          "match" :"message", 

          "match_mapping_type" :"string", 

          "mapping" : { 

            "type" :"string", "index" : "not_analyzed","omit_norms" : true, 

            "fielddata" : {"format" : "disabled" } 

          } 

        } 

      }, { 

        "string_fields" : { 

          "match" :"*", 

          "match_mapping_type" :"string", 

          "mapping" : { 

            "type" :"string", "index" : "not_analyzed","omit_norms" : true, 

            "fielddata" : {"format" : "disabled" }, 

            "fields" : { 

              "raw" :{"type": "string", "index" :"not_analyzed", "ignore_above" : 256} 

            } 

          } 

        } 

      } ] 

    } 

  } 

}


然后是logstash的启动文件jdbc.conf里面output模块配置:

if[type] =="my_type"{

        elasticsearch {

           hosts => ["192.168.110.31:8011","192.168.110.31:8012","192.168.110.31:8013"]

           user => "elastic"

           password => "abc123qwer"

           index => "my_index"

           document_id => "%{id}"

           #manage_template =>"false"

           template =>"/home/lvyuan/elasticsearch/logstash-5.5.3/template/es-template.json"

           template_name =>"my_index"

           template_overwrite =>"true"

 

        }

}


       启动前先删除以前创建的索引和模板,启动后发现没生效的话,一定要先删除索引和模板(是存储到_template下的模板,不是这个模板物理文件),然后再修改再运行看看。

curl -XDELETE-u elastic 'http://192.168.110.31:8011/_template/my_index'

curl -XDELETE-u elastic 'http://192.168.110.31:8011/my_index'


       因为我的初衷是elasticsearch替代mysql的sql语句查询,并不想全文搜索,所以分词还可能影响我的功能,例如有一个字段在mysql中是存储一段既有大写有小写间杂的字母序列(eg:HTZG5jjhffdwe),当采用默认的映射模板(会分词)时,会将这个字母序列先全部转为小写再存入token中,这样的话,用termQuery(不分词,精确匹配)肯定找不到,有人会说可以用matchPhraseQuery,这个的确可以查到;但是如果我想用前缀匹配时prefixQuery(不分词)就查不到了,用以小写的“htzg5”开头的前缀可以匹配到,但是用大写的就匹配不到,token表里全是小写的,肯定匹配不到。所以说具体情况具体分析,不是所有情况下都应该分词。你可以试一下:

http://localhost:8011/_analyze?pretty&analyzer=standard&text=HTZG5jjhffdEX7w52r37880    全转为小写存入token

:{"tokens":[{"token":"htzg5jjhffdex7w52r37880","start_offset":0,"end_offset":22,"type":"<ALPHANUM>","position":0}]}

     

参考地址:http://blog.csdn.net/asia_kobe/article/details/51192848

                  http://www.cnblogs.com/NextNight/p/6860283.html

                  http://www.cnblogs.com/cocowool/p/elk_dynamic_templates.html

                  https://elasticsearch.cn/article/21

                  http://m.blog.csdn.net/u012516166/article/details/75106184


评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值