mysql数据抽取同步至elasticsearch中

了解了几种方式:

    1.使用mysql的binlog日志,这个可以使用阿里的canal,进行同步至es中

    2.使用es官方推荐的logstash-input-jdbc,这是logstash的一个插件,源码地址logstash-input-jdbc

首先,安装logstash,此处不再赘述,由于logstash-input-jdbc使用ruby快发,所以还需要安装ruby,选择下载安装,安装好后,打开CMD输入ruby -v查看是否安装成功


然后修改gem的源,使用gem sources -l

删除原来的源

gem sources --remove https://rubygems.org/

添加新的源

gem sources -a http://gems.ruby-china.org/
gem sources -l

修改成功后,还需要修改Gemfile的数据源地址:

gem install bundler
bundle config mirror.https://rubygems.org https://gems.ruby-china.org

然后就是安装logstash-input-jdbc,在logstash的bin的目录下,执行下面的命令:


.\logstash-plugin.bat install logstash-input-jdbc

等一会后会显示Installation successful的字样,安装成功后,就是怎么使用了,

官方文档地址:https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

在bin目录下新建一个文件夹,比如mysql


新建一个jdbc.conf文件,配置如下:

    

input {
    stdin {
    }
    jdbc {
      jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
      jdbc_user => "root"
      jdbc_password => "123456"
      jdbc_driver_library => "F:\Program Files\logstash-6.2.4\bin\mysql\mysql-connector-java-5.1.30.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "300000"
      use_column_value => "true"
      tracking_column => "id"
      statement_filepath => "F:\Program Files\logstash-6.2.4\bin\mysql\jdbc.sql"
	  schedule => "* * * * *"
	  type => "jdbc"
	  jdbc_default_timezone =>"Asia/Shanghai"
    }
}


filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
}


output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "test_out"
		template => "F:\Program Files\logstash-6.2.4\bin\mysql\es-template.json"
		template_name => "t-statistic-out-logstash"
		template_overwrite => true
		document_type => "out"
        document_id => "%{id}"
    }
    stdout {
        codec => json_lines
    }
}

然后把数据库的连接的jar包放到此文件夹下,然后新建一个jdbc.sql脚本文件,用来执行sql的,select * from test where id>=:sql_last_value,:sql_last_value是根据id变化后,根据指定的时间进行拉取更新es中的数据,实现增量同步更新数据。

然后启动logstash,执行命令:

.\logstash.bat -f  .\mysql\jdbc.conf

如果启动不成功,检查jdbc.conf和jdbc.sql编码格式,设置为UTF-8无BOM格式就可以

启动信息类似下面这种,表示启动成功,它会自动执行jdbc.sql中的sql语句,查询获取数据同步至es中:

[2018-07-04T15:59:46,690][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-07-04T15:59:46,869][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-07-04T15:59:46,915][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-07-04T15:59:46,921][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-07-04T15:59:46,936][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>"F:\\Program Files\\logstash-6.2.4\\bin\\mysql\\es-template.json"}
[2018-07-04T15:59:46,951][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"t-statistis-out-template", "order"=>1, "settings"=>{"index"=>{"refresh_interval"=>"5s"}}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>false}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"not_analyzed"}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"not_analyzed"}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}, "acc_id"=>{"type"=>"keyword"}, "acc_name"=>{"type"=>"keyword"}, "acc_pp"=>{"type"=>"keyword"}, "account_price_type"=>{"type"=>"keyword"}, "cyacc_no"=>{"type"=>"keyword"}, "order_id"=>{"type"=>"keyword"}, "voucher_id"=>{"type"=>"keyword"}}}}, "aliases"=>{}}}
[2018-07-04T15:59:46,986][INFO ][logstash.outputs.elasticsearch] Installing elasticsearch template to _template/t-statistic-out-logstash
[2018-07-04T15:59:47,097][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2018-07-04T15:59:47,499][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1b1682f0 run>"}
The stdin plugin is now waiting for input:
[2018-07-04T15:59:47,613][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-07-04T16:00:01,955][INFO ][logstash.inputs.jdbc     ] (0.009963s) SELECT version()

如果同步至es中的字段不需要设置分词,不然查询时分词会导致结果不正确,模糊匹配不能像关系数据库中使用like那样,就需要在jdbc.conf的output里设置template、template_name和template_overwrite,如下所示:

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "t_statistic_out"
		template => "F:\Program Files\logstash-6.2.4\bin\mysql\es-template.json"
		template_name => "t-statistic-out-logstash"
		template_overwrite => true
		document_type => "out"
        document_id => "%{id}"
    }
    stdout {
        codec => json_lines
    }
}

es-template.json配置文件如下,设置不分词的字段值type为keyword:

{
	"template" : "t-statistis-out-template", 
	"order":1,
	"settings": {
            "index": {
                "refresh_interval": "5s"
            }
        },
		"mappings": {
            "_default_": {
			"_all" : {"enabled":false}, 
                "dynamic_templates": [
                    { 
              "message_field" : { 
                "match" : "message", 
                "match_mapping_type" : "string", 
                "mapping" : { "type" : "string", "index" : "not_analyzed" } 
              } 
            }, { 
              "string_fields" : { 
                "match" : "*", 
                "match_mapping_type" : "string", 
                "mapping" : { "type" : "string", "index" : "not_analyzed" } 
              } 
            }
                ],
                "properties": {
                    "@timestamp": {
                        "type": "date"
                    },
                    "@version": {
                        "type": "keyword"
                    },
                    "geoip": {
                        "dynamic": true,
                        "properties": {
                            "ip": {
                                "type": "ip"
                            },
                            "location": {
                                "type": "geo_point"
                            },
                            "latitude": {
                                "type": "half_float"
                            },
                            "longitude": {
                                "type": "half_float"
                            }
                        }
                    },
					"acc_id": {
                        "type": "keyword"
                    },
					"acc_name": {
                        "type": "keyword"
                    },
					"acc_pp": {
                        "type": "keyword"
                    },
					"account_price_type": {
                        "type": "keyword"
                    },
					"cyacc_no": {
                        "type": "keyword"
                    },
					"order_id": {
                        "type": "keyword"
                    },
					"voucher_id": {
                        "type": "keyword"
                    }
					
                }
            }
        },
        "aliases": {}
    
}

如果发现没生效,设置order大于0,logstash启动的时候会默认发送一个logstash的文件,此时需要删除这个默认的


然后再删除对应的索引,重新启动,就会发现生效了


  • 1
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值