logstash 读取mysql,写入hdfs

环境
  • centos 6.5
  • JDK 8
  • Logstash 6.8
  • CDH 5.15

logstash安装,参见Logstash RPM安装

mysql-connector安装
$ cd /var/tmp
$ wget https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.48.zip
$ unzip mysql-connector-java-5.1.48.zip
$ cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48-bin.jar  /usr/share/java/
配置logstash

temp_host_nonhealth表内容如下:

day_keyhour_keycache_namehosthost_nonhealth_num
2018-10-1318yd_hz_hn_pic_zu1192.168.1.21010
$ vi logstash-jdbc.conf
input {
    jdbc {
        jdbc_driver_library => "/usr/share/java/mysql-connector-java-5.1.46-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://vps234:63751/test?useSSL=false"
        jdbc_user => "readonly"
        jdbc_password => "password"
        statement => "SELECT * from temp_host_nonhealth limit 10"
        clean_run => true
    }
}

output {
  webhdfs {
    host => "vps125"
    single_file_per_thread => true
    path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{[@metadata][thread_id]}.log" 
    user => "hdfs"
    codec => line {
        format => "%{day_key}|%{hour_key}|%{cache_name}|%{host}|%{host_nonhealth_num}"
    }
  }
}

运行logstash的主机上要将CDH集群的hostname写入/etc/hosts;
当在hdfs创建或附加文件时,webhdfs会向HOSTNAME发送307 TEMPORARY_REDIRECT;

webhdfs会用多线程写入,而且是在CDH集群的各个节点写入,就会造成错误

AlreadyBeingCreatedException....Failed to create file 

所以开启每个线程写一个文件,文件路径要带上线程ID

运行

$ /usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash" -f logstash-jdbc.conf

查看结果

$ sudo -u hdfs hadoop fs -cat /user/logstash/dt=2019-10-22/logstash-10.log

2018-10-13|18|chche12|192.168.1.28|10
2018-10-13|18|chche1u2|192.168.1.27|1
2018-10-13|18|chche1zu1|192.168.1.210|10
2018-10-13|18|chche1u1|192.168.1.252|1
2018-10-13|18|chche1_p1|192.168.1.14|10
2018-10-13|18|chche15|192.168.1.158|10
2018-10-13|18|chche1ic2|192.168.1.23|2
2018-10-13|18|chchezu1|192.168.1242|10
2018-10-13|18|chche1_h1|192.168.1250|3
2018-10-13|18|chche1h31|192.168.117|9
mysql Date类型处理

如果mysql的日期是date类型,logstash读取为事件时,会自动转为ISO8601日期格式,需要用过滤器转获得日期

mysql表内容

day_keyhour_keycache_namehosthost_nonhealth_num
2018-10-1318yd_hz_hn_pic_zu1192.168.1.21010

day_key类型为date时,输入

2018-10-13T16:00:00.000Z

ISO8601日期格式无法进一步处理,需要先转为字符串

filter {
    mutate {
        convert => {
            "day_key"=>"string"
        }
    }
    # 传递给@timestamp
    date {
        match => [ "day_key", "ISO8601" ] 
    }
    mutate {
        split => { "day_key" => "T" }
        add_field => { "day_key_string" => "%{day_key[0]}"}
    }
}

输出

{
			...
            "@timestamp" => 2018-10-13T16:00:00.000Z,
               "day_key" => [
        [0] "2018-10-13",
        [1] "16:00:00.000Z"
    ],
        "day_key_string" => "2018-10-13"
}

相应的,输出时,使用day_key_string代替day_key

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值