Elasticsearch使用过程中问题总汇(一)

最新推荐文章于 2024-07-31 17:02:34 发布

尘埃的故事

最新推荐文章于 2024-07-31 17:02:34 发布

阅读量1.5k

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch logstash

本文链接：https://blog.csdn.net/u011144425/article/details/78442129

版权

elasticsearch 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

内容

汇总在项目开发过程中使用ES及logstash产生的问题及解决办法。

1、logstash-input-jdbc安装及使用

详见上一篇

2、logstash-input-jdbc同步mysql过程中增量同步（相对于全量同步）

在logstash配置文件中，如果只配置最基本的几个参数，通常会默认为每次全量拉取数据，这显然大部分时间下是没有必要的。
而要做到每次只拉取新增或新改动的数据，就需要配置几个参数：

 lowercase_column_names => "false"
 #处理中文乱码问题
 codec => plain { charset => "UTF-8"}

 #使用其它字段追踪，而不是用时间,false默认是timestamp
 use_column_value => true
 #追踪的字段
 tracking_column => id
 #是否记录上次执行结果, 如果为真,将会把上次执行到的 tracking_column字段的值记录下来,保存到 last_run_metadata_path 指定的文件中
 record_last_run => true
 #上一个sql_last_value值的存放文件路径, 必须要在文件中指定字段的初始值
 last_run_metadata_path => "E:\ES\logstash-5.6.3\bin\config-mysql\station-user.txt"
 #是否清除 last_run_metadata_path 的记录,如果为真那么每次都相当于从头开始查询所有的数据库记录
 clean_run => false

而sql语句要加上 where 追踪的字段 > :sql_last_value（被记录的上次执行结果，这个值可以在上面配置的文件路径中看到）

e.g.select userID id,username,nickname from user where userID > :sql_last_value
再执行配置文件，可以看到logstash只会同步id满足条件的记录，如果想要同步改动的记录，最好还是跟踪时间变量，简单易行。

3、logstash、MySQL关于时区的问题（未解决）

1）mysql The server time zone value 异常:

解决：jdbc_connection_string => "jdbc:mysql://localhost:3306/demo?serverTimezone=UTC"
在数据库地址后加?serverTimezone=UTC，但是这个时间是世界标准时间0时区,不过如果我们改成东八区时间，在logstash记录mysql
数据时又会将时间都转换成UTC时间，会造成问题logstash里看到的时间直观上比数据库里时间早8个小时，所以暂时保留就以UTC时间
同步到logstash。

2）logstash 0时区问题：

关于logstash中时间格式UTC，我在尝试多种方法后，仍未有效将时间转换成北京时间，只能暂时搁置。
logstash中时间格式是2017-11-03T12:12:12.000Z,查看ISO 8601，可知这种时间格式其中包含date和time，
中间T，即Time，Z，即时区。后面加上Z表示是零时区的时间，要表示其他时区的时间，比如东八区的时间，
可以用2017-11-03T12:12:12-0800表示，也就是在时间的后面加上+/-hh:mm来表示时间差，
+表示时间早，-表示时间晚，比如官方的例子中，下面的时间是相同的：”18:30Z”, “22:30+04”, “1130−0700”, and “15:00−03:30”
e.g.2017-11-03T17:30:08+08:00或20171103T173008+08

4、elasticsearch之mapping映射

官网地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

mapping映射：

A mapping defines the fields within a type, the datatype for each field, and how the field should be handled by Elasticsearch. A mapping is also used to configure metadata associated with the type.

Mapping定义了type中的诸多字段的数据类型以及这些字段如何被Elasticsearch处理，比如一个字段是否可以查询以及如何分词等。

如果index 理解为数据库实例，type理解为表,mapping可以理解为表的结构和相关设置的信息。
对单张表即一个type映射时，在logstash output中添加映射模板如下，
若要进行多个type进行映射，则在mapping中添加多个type.
一般默认的mapping即可，不用显示定义mapping，只有要覆盖默认值时才必须要提供mapping定义。

output {
    elasticsearch {
        hosts => "127.0.0.1:9200"
        index => "index"
        template => "config-mysql/yc-template-notice.json"
        template_name => "index"
        template_overwrite => true
        document_id => "%{id}"
    }

    stdout {
        codec => json_lines
    }
}

模板json就是一般的mapping写法：

{
  "template" : "index",
  "settings" : {
    "index.number_of_shards" : 5,
    "number_of_replicas" : 1,
    "index.refresh_interval" : "60s"
  },
  "mappings" : {
    "notice" : {
       "_all" : {"enabled" : true},       
       "properties" : {
        "@version": { "type": "string", "index": "not_analyzed" },
        "iflook"  : {"type" : "byte"},
        "usr_id"  : {"type" : "integer"},
        "id"  : {"type" : "integer"},
        "create_date"  : {
                    "type" : "date",  
                    "index" : "not_analyzed",  
                    "doc_values" : true },
        "notice_id"  : {"type" : "integer"}
       }
    }
  }
}

更改映射：

需要注意一下，我们已经存在的索引是不可以更改它的映射的，对于存在的索引，只有新字段出现时，Elasticsearch才会自动进行处理。
如果确实需要修改映射，那么就使用reindex,采用重新导入数据的方式完成（logstash的方法）
或者通过重新建索引方法

1.如果要推到现有的映射,你得重新建立一个索引.然后重新定义映射
2.然后把之前索引里的数据导入到新的索引里
-------具体方法------
1.给现有的索引定义一个别名,并且把现有的索引指向这个别名,运行步骤2
2.运行: PUT /现有索引/_alias/别名A
3.新创建一个索引,定义好最新的映射
4.将别名指向新的索引.并且取消之前索引的执行,运行步骤5
5.运行: POST /_aliases
        {
            "actions":[
                {"remove"    :    {    "index":    "现有索引名".    "alias":"别名A"    }}.
                {"add"        :    {    "index":    "新建索引名",    "alias":"别名A"    }}
            ]
        }
注意:通过这几个步骤就实现了索引的平滑过渡,并且是零停机