环境
- centos 6.5
- JDK 8
- Logstash 6.8
- CDH 5.15
logstash安装,参见Logstash RPM安装
mysql-connector安装
$ cd /var/tmp
$ wget https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.48.zip
$ unzip mysql-connector-java-5.1.48.zip
$ cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48-bin.jar /usr/share/java/
配置logstash
temp_host_nonhealth表内容如下:
day_key | hour_key | cache_name | host | host_nonhealth_num |
---|---|---|---|---|
2018-10-13 | 18 | yd_hz_hn_pic_zu1 | 192.168.1.210 | 10 |
$ vi logstash-jdbc.conf
input {
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-5.1.46-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://vps234:63751/test?useSSL=false"
jdbc_user => "readonly"
jdbc_password => "password"
statement => "SELECT * from temp_host_nonhealth limit 10"
clean_run => true
}
}
output {
webhdfs {
host => "vps125"
single_file_per_thread => true
path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{[@metadata][thread_id]}.log"
user => "hdfs"
codec => line {
format => "%{day_key}|%{hour_key}|%{cache_name}|%{host}|%{host_nonhealth_num}"
}
}
}
运行logstash的主机上要将CDH集群的hostname写入/etc/hosts;
当在hdfs创建或附加文件时,webhdfs会向HOSTNAME发送307 TEMPORARY_REDIRECT
;
webhdfs会用多线程写入,而且是在CDH集群的各个节点写入,就会造成错误
AlreadyBeingCreatedException....Failed to create file
所以开启每个线程写一个文件,文件路径要带上线程ID
运行
$ /usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash" -f logstash-jdbc.conf
查看结果
$ sudo -u hdfs hadoop fs -cat /user/logstash/dt=2019-10-22/logstash-10.log
2018-10-13|18|chche12|192.168.1.28|10
2018-10-13|18|chche1u2|192.168.1.27|1
2018-10-13|18|chche1zu1|192.168.1.210|10
2018-10-13|18|chche1u1|192.168.1.252|1
2018-10-13|18|chche1_p1|192.168.1.14|10
2018-10-13|18|chche15|192.168.1.158|10
2018-10-13|18|chche1ic2|192.168.1.23|2
2018-10-13|18|chchezu1|192.168.1242|10
2018-10-13|18|chche1_h1|192.168.1250|3
2018-10-13|18|chche1h31|192.168.117|9
mysql Date类型处理
如果mysql的日期是date类型,logstash读取为事件时,会自动转为ISO8601日期格式,需要用过滤器转获得日期
mysql表内容
day_key | hour_key | cache_name | host | host_nonhealth_num |
---|---|---|---|---|
2018-10-13 | 18 | yd_hz_hn_pic_zu1 | 192.168.1.210 | 10 |
day_key类型为date时,输入
2018-10-13T16:00:00.000Z
ISO8601日期格式无法进一步处理,需要先转为字符串
filter {
mutate {
convert => {
"day_key"=>"string"
}
}
# 传递给@timestamp
date {
match => [ "day_key", "ISO8601" ]
}
mutate {
split => { "day_key" => "T" }
add_field => { "day_key_string" => "%{day_key[0]}"}
}
}
输出
{
...
"@timestamp" => 2018-10-13T16:00:00.000Z,
"day_key" => [
[0] "2018-10-13",
[1] "16:00:00.000Z"
],
"day_key_string" => "2018-10-13"
}
相应的,输出时,使用day_key_string
代替day_key