1.简介
本文说明使用logstash来实现全量和增量将mysql数据导入到es中。每一分钟执行一次。有时效性高的要求可以提高频率。
2.logstash同步mysql数据到elasticsearch
logstash-plugin install logstash-output-elasticsearch logstash-plugin install logstash-input-jdbc
将mysql-connector-java-8.0.11.jar copy到logstash/bin/mysql目录下
3. 时间维度
1.以一定的时间间隔定时更新mysql数据到es。
2.配置如下
全量构建索引
(1)jdbc.conf
input { jdbc { jdbc_default_timezone => "Asia/Shanghai" # mysql 数据库链接,school-edu为数据库名 jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/school-edu?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false&serverTimezone=GMT%2B8&zeroDateTimeBehavior=convertToNull" # 用户名和密码 jdbc_user => "xxxxx" jdbc_password => "xxxxx" # 驱动 jdbc_driver_library => "E:\linux\ELK\6.8.2\logstash\bin\mysql\mysql-connector-java-8.0.18.jar" # 驱动类名 jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_paging_enabled => "true" jdbc_page_size => "50000" # 执行的sql 文件路径+名称 statement_filepath => "E:\linux\ELK\6.8.2\logstash\config\mysql\jdbc.sql" # 设置监听间隔 各字段含义(由左至右)分、时、天、月、年,全部为*默认含义为每分钟都更新 schedule => "* * * * *" } } output { elasticsearch { # ES的IP地址及端口 hosts => ["localhost:9200"] # 索引名称 index => "teacher" user => "elastic" password => "changeme" document_type => "_doc" # 自增ID 需要关联的数据库中有有一个id字段,对应索引的id号 document_id => "%{id}" } stdout { # JSON格式输出 codec => json_lines } } filter{ grok { # 移除不需要的字段 remove_field => ["@timestamp","@version"] } mutate { # 增加一个request_time_format用于标识时间 add_field => { "request_time_format" => "" } } json { source => "message" } # 根据标识字段判断使用不同的过滤规则 if[fields][log_source] == "ip" { date { # 这里是格式化时间2019-09-26 19:17:56 match => ["get_time" ,"yyyy-MM-dd HH:mm:ss"] target => "request_time_format" locale => "cn" } ruby { # 真实的时间是比中国时间慢八个小时,这里把时间加上去 code => "event.set('request_time_format', event.get('request_time_format').time.localtime + 8*60*60)" } } if[fields][log_source] == "ips" { date { # 格式化微秒时间UNIX_MS,具体请参看文档 match => [ "request_time" ,"yyyy-MM-dd HH:mm:ss,SSS" , "UNIX_MS" ] # 过滤的时间格式赋值到request_time_format上 target => "request_time_format" locale => "cn" } ruby { code => "event.set('request_time_format', event.get('request_time_format').time.localtime + 8*60*60)" } } }
(2)jdbc.sql
select id, `name`, intro, career, avatar, sort, upd_time from edu_teacher
增量构建索引
(1)jdbc-update.conf
input { jdbc { jdbc_default_timezone => "Asia/Shanghai" # mysql 数据库链接,school-edu为数据库名 jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/school-edu?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false&serverTimezone=GMT%2B8&zeroDateTimeBehavior=convertToNull" # 用户名和密码 jdbc_user => "xxxxxx" jdbc_password => "xxxxx" # 驱动 jdbc_driver_library => "E:\linux\ELK\6.8.2\logstash\bin\mysql\mysql-connector-java-8.0.18.jar" # 驱动类名 jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_paging_enabled => "true" jdbc_page_size => "50000" last_run_metadata_path => "E:\linux\ELK\6.8.2\logstash\config\mysql\last_value_meta.txt" # 执行的sql 文件路径+名称 statement_filepath => "E:\linux\ELK\6.8.2\logstash\config\mysql\jdbc-update.sql" # 设置监听间隔 各字段含义(由左至右)分、时、天、月、年,全部为*默认含义为每分钟都更新 schedule => "* * * * *" } } output { elasticsearch { # ES的IP地址及端口 hosts => ["localhost:9200"] # 索引名称 index => "teacher" user => "elastic" password => "changeme" document_type => "_doc" # 自增ID 需要关联的数据库中有有一个id字段,对应索引的id号 document_id => "%{id}" } stdout { # JSON格式输出 codec => json_lines } }
(2)jdbc-update.sql
select id, `name`, intro, career, avatar, sort, upd_time from edu_teacher where upd_time >= :sql_last_value
(3)last_value_meta.txt
2020-01-29 00:00:00
全量构建
logstash -f E:\linux\ELK\6.8.2\logstash\config\mysql\jdbc.conf
增量构建
logstash -f E:\linux\ELK\6.8.2\logstash\config\mysql\jdbc-update.conf
检查logstash的配置文件是否配置正确(logstash根目录下运行)
./bin/logstash -f ./config/logstash-test.conf --config.test_and_exit
运行logstash根目录下运行)
./bin/logstash -f config/logstash-test.conf
#logstash中grok插件的使用
#grok中match插件的使用
grok {
#提取字段
match => {
"source" => "(\w+/){2}(?<project>.*?)/.*"
}
}
mutate {
#重写字段
rename => {
"project" => "proj"
}
}
mutate {
#去掉没用字段
remove_field => ["input_type","count","tags","@version","fields","offset","txt","level_name"]
}
filter{
date{
match => ["fieldName", "yyyyMMdd","yyyy-MM-dd"]
target => "fieldName1"
timezone => "Asia/Shanghai"
}
}
版权声明:本文为博主原创文章,遵循
CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。