logstash7.0.1将mysql5.7.1数据库数据同步至elasticsearch7.0.1

使用软件版本
es:7.0.1
logstash:7.0.1
mysql5.7.1
数据库插件:mysql-connector-java-5.1.18.jar

同步方式按照数据库的时间字段来进行同步
遇到了太多的坑,这里记录一下配置文件

input {
  jdbc {
    jdbc_driver_library => "/root/logstash7.0.1/mysql-connector-java-5.1.18.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://域名或ip:端口号/数据库"
    jdbc_user => "username"
    jdbc_password => "password"
    tracking_column => "publish_time"
    tracking_column_type => "timestamp"
    use_column_value => true
    schedule => "*/5 * * * * *"
    jdbc_paging_enabled => "true"
	jdbc_page_size => "500"
    statement => "SELECT *, UNIX_TIMESTAMP(publish_time) AS unix_ts_in_secs FROM article_lib WHERE (UNIX_TIMESTAMP(publish_time) > :sql_last_value AND publish_time < NOW()) ORDER BY publish_time ASC"
  }
}

output {
  # stdout { codec =>  "rubydebug"}
  elasticsearch {
          hosts => "127.0.0.1:9201"
      index => "idx_article"
      document_id => "%{article_id}"
  }
}

启动logstash的时候遇到以下问题:说什么int类型不能转换为string类型

[ERROR][logstash.javapipeline    ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<TypeError: no implicit conversion of Integer into String>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/date/format.rb:335:in `_parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/date.rb:734:in `parse'", "/nas/logstash-7.3.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/plugin_mixins/jdbc/value_tracking.rb:87:in `set_value'", "/nas/logstash-7.3.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/plugin_mixins/jdbc/value_tracking.rb:36:in `initialize'", "/nas/logstash-7.3.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/plugin_mixins/jdbc/value_tracking.rb:29:in `build_last_value_tracker'", "/nas/logstash-7.3.0/vendor/bundle/jruby/2.5.0/gems/logstash-input-jdbc-4.3.13/lib/logstash/inputs/jdbc.rb:216:in `register'", "/nas/logstash-7.3.0/logstash-core/lib/logstash/java_pipeline.rb:192:in `block in register_plugins'", "org/jruby/RubyArray.java:1792:in `each'", "/nas/logstash-7.3.0/logstash-core/lib/logstash/java_pipeline.rb:191:in `register_plugins'", "/nas/logstash-7.3.0/logstash-core/lib/logstash/java_pipeline.rb:292:in `start_inputs'", "/nas/logstash-7.3.0/logstash-core/lib/logstash/java_pipeline.rb:248:in `start_workers'", "/nas/logstash-7.3.0/logstash-core/lib/logstash/java_pipeline.rb:146:in `run'", "/nas/logstash-7.3.0/logstash-core/lib/logstash/java_pipeline.rb:105:in `block in start'"], :thread=>"#<Thread:0x673f5fa3 run>"}
[2020-09-10T23:48:51,350][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}

找了一整天发现原因
1、这有可能是因为 tracking_column 之前按照 numeric 类型运行,后来改成 timestamp ,检查last_run_metadata_path,默认路径 $HOME/.logstash_jdbc_last_run,如果是因为以上原因,删除重新创建即可。或者手动更改配置文件内容。
进入logstash的bin文件夹输入以下命令进行修改。

vi $HOME/.logstash_jdbc_last_run

tracking_column 之前按照 numeric 类型运行

--- 0

修改为下面这种方式,时间可自定义

--- 2010-08-17 06:34:00.784150000 Z

如果嫌修改太麻烦,也可直接将改文件删除然后再次启动logstash,它会自动创建

2、配置文件的参数或者格式有问题
3、缺少关键配置字段

还有其它要注意的地方,用户名密码最好都要加上双引号,如果用户名和密码种有@符号,这时候如果不用双引号将他们引起来,logstash读的时候会把@当成连接符,就会出现问题。

还有另一个问题,有的时候在下方日志的上一行会有其他的具体错误输出,比如tracking_column未设置啥的
但是我的就比较独特了,啥都没有就来个这,从深夜至凌晨,发现是mysql连接的驱动版本不对,我之前使用的是mysql-connector-java-5.1.18.jar。改成mysql-connector-java-8.0.16.jar就没问题了,但是之前用的就是5.1.8为啥可以呢,百思不得其姐…

2012 rufus-scheduler intercepted an error:
  2012   job:
  2012     Rufus::Scheduler::CronJob "*/5 * * * * *" {}

最终版配置文件

input {
  jdbc {
    jdbc_driver_library => "/root/logstash-7.0.1/config/mysql-connector-java-8.0.16.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://192.168.120.11:53306/cmsdb?useUnicode=true&characterEncoding=utf-8&useSSL=false"
    jdbc_user => "root"
    jdbc_password => "Root@20201028"
    jdbc_default_timezone =>"Asia/Shanghai"
    tracking_column => "publish_time"
    tracking_column_type => "timestamp"
    use_column_value => true
    schedule => "*/5 * * * * *"
    jdbc_paging_enabled => "true"
    jdbc_page_size => "500"
    statement => "SELECT *, UNIX_TIMESTAMP(publish_time) AS unix_ts_in_secs FROM article_lib WHERE (UNIX_TIMESTAMP(publish_time) > :sql_last_value AND publish_time < NOW()) ORDER BY publish_time ASC"
  }
}

output {
  #stdout { codec =>  "rubydebug"}
  elasticsearch {
          hosts => "192.168.120.11:9201"
      index => "idx_article"
      document_id => "%{article_id}"
  }
}

$HOME/.logstash_jdbc_last_run文件中的数据为

vi $HOME/.logstash_jdbc_last_run
--- 2010-08-17 06:34:00.784150000 Z

最终版mysql.conf,测试环境没问题,但是客户的环境一直出现各种各样的问题

input {
  jdbc {
    jdbc_driver_library => "/root/logstash-7.0.1/config/mysql-connector-java-8.0.16.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://111.168.120.11:53306/cmsdb"
    jdbc_user => "username"
    jdbc_password => "password"
    tracking_column => "publish_time"
    tracking_column_type => "timestamp"
	use_column_value => true
	#开启增量同步,必须配置下面这个变量
	clean_run => false
	jdbc_paging_enabled => "true"
	jdbc_page_size => "2500"
	#有的服务器能够读取到这个文件,有的只能读取默认的配置文件vi $HOME/.logstash_jdbc_last_run
	last_run_metadata_path => "/root/logstash7.0.1/config/last_time.txt"
    schedule => "*/5 * * * * *"
    statement => "SELECT *, UNIX_TIMESTAMP(publish_time) AS unix_ts_in_secs FROM article_lib WHERE (UNIX_TIMESTAMP(publish_time) > :sql_last_value AND publish_time < NOW()) ORDER BY publish_time ASC"
  }
}

output {
  # stdout { codec =>  "rubydebug"}
  elasticsearch {
	  hosts => "111.168.120.11:9201"
      index => "idx_article"
      document_id => "%{article_id}"
  }
}

启动logstash的时候还遇到了一个找不到驱动的问题
一般有两种情况:
1、mysql的驱动jar包版本不对应
2、配置文件格式不正确
我就比较幸运了,跑着跑着发现我的驱动jar包还在,但是大小变成了0字节,,,呵呵,上天待我果然不薄。重新上传一个jar就可以了

 2016    Rufus::Scheduler::CronJob "* * * * *" {}

  2016  error:

  2016    2016

  2016    LogStash::ConfigurationError

  2016    Java::oracle.jdbc.driver.OracleDriver not loaded. Are you sure you've included the correct jdbc driver in :jdbc_driver_library?

以上配置完毕后,logstash运行时,有时会按时间戳来进行查询,有些时候,打印的sql语句明明是按时间来过滤的,但是还是从数据库的第一条数据进行过滤。方法修改时间戳,并添加时间记录文件。这样就可以按时间来进行增量查询并入库到es了。

在这里插入图片描述
时间戳配置文件

--- 2010-08-17 06:34:00.784150000 Z

所需软件下载

logstash:7.0.1
链接:https://pan.baidu.com/s/1vmzLx0nsLiNlGcDeXpeaqg
提取码:uhu7

mysql:5.7.1

elasticsearch:7.0.1
链接:https://pan.baidu.com/s/11n90KE724xzSDLiZi7EaEg
提取码:364y

7.0.0ik分词器
链接:https://pan.baidu.com/s/1nWycNaEni9GcxhHDghg-lw
提取码:pc2p

mysql插件:mysql-connector-java-5.1.18.jar
链接:https://pan.baidu.com/s/1N0Eo0cgNvrxWcYDgPfvh0w
提取码:sy6d

mysql插件:mysql-connector-java-8.0.16
链接:https://pan.baidu.com/s/1WO6HZbNwNxCyMdpQ-RrP1Q
提取码:7zed
复制这段内容后打开百度网盘手机App,操作更方便哦

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值