在config里复制logstash-sample.conf,写一个自己的conf
mytask.conf的配置如下
# Sample Logstash configuration for receiving
# UDP syslog messages over port 514
input {
jdbc {
jdbc_driver_library => "D:\ElasticSearch\logstash-7.17.11\config\mysql-connector-java-8.0.29.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/ysou"
jdbc_user => "root"
jdbc_password => "root"
use_column_value => true
tracking_column_type => "timestamp"
tracking_column => "updatetime"
parameters => { "favorite_artist" => "Beethoven" }
schedule => "*/5 * * * * *"
statement => "SELECT * from post where updateTime > :sql_last_value and updateTime < now() order by updateTime desc"
jdbc_default_timezone => "Asia/Shanghai"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "127.0.0.1:9200"
index => "post_v1"
document_id => "%{id}"
}
}
use_column_value设为true, tracking_column_type就是会动态改变查询条件的:sql_last_value的值, tracking_column就是代表会取哪个字段的最后一条作为sql_last_value的值写进去
注意:查询语句中要按tracking_column的字段进行排序,保证最后一条是最大的
注意:被记录的值是根据logstash能读取到的值进行保存的,tracking_column记录的值是存在D:\ElasticSearch\logstash-7.17.11\data\plugins\inputs\jdbc里的logstash_jdbc_last_run里。如果要全量更新,只需要删除D:\ElasticSearch\logstash-7.17.11\data\plugins\inputs\jdbc里的logstash_jdbc_last_run文件(这个文件存储了上次同步到的数据)
踩坑:在logstash里, logstash会把读取的内容字段改成小写,所以配置里的tracking_column_type和tracking_column的updateTime和Timestamp都改成updatetime和timestamp,这样才能把读取文件的最新的updatetime的值写进sql_last_value
在logstash的bin目录中输入cmd打开编辑窗口
输入 logstash.bat -f ..\config\mytask.conf
这时,logstash就会把数据库的数据同步到elasticsearch中,但同时会出现两个问题:
1.字段全变成小写了
2.多了一些我们不想同步的数据
这时就要用到logstash的数据处理功能了 官网文档
所以配置文件要改成:
# Sample Logstash configuration for receiving
# UDP syslog messages over port 514
input {
jdbc {
jdbc_driver_library => "D:\ElasticSearch\logstash-7.17.11\config\mysql-connector-java-8.0.29.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/ysou"
jdbc_user => "root"
jdbc_password => "root"
use_column_value => true
tracking_column_type => "timestamp"
tracking_column => "updatetime"
parameters => { "favorite_artist" => "Beethoven" }
schedule => "*/5 * * * * *"
statement => "SELECT * from post where updateTime > :sql_last_value and updateTime < now() order by updateTime desc"
jdbc_default_timezone => "Asia/Shanghai"
}
}
filter {
mutate {
rename => {
"updatetime" => "updateTime"
"userid" => "userId"
"createtime" => "createTime"
"isdelete" => "isDelete"
}
remove_field => ["thumbnum","favournum"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "127.0.0.1:9200"
index => "post_v1"
document_id => "%{id}"
}
}
然后启动logstash,可以看到elasticsearch是已经写入了数据且按照自己想要的字段进行展示