一、ES配置
1.下载文件
下载elasticsearch:https://www.elastic.co/cn/downloads/past-releases#elasticsearch
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.3-linux-x86_64.tar.gz
插件: https://github.com/mobz/elasticsearch-head/archive/v5.0.0.tar.gz
2.安装配置
修改elasticsearch.yum
node.name: node-1
network.host: 0.0.0.0
cluster.initial_master_nodes: ["node-1"]
修改 jvm.options
-Xms1g
-Xmx1g
3.启动
bin/elasticsearch &
二、Logstash配置
1.下载文件
下载logstash:https://www.elastic.co/cn/downloads/past-releases#logstash
https://artifacts.elastic.co/downloads/logstash/logstash-7.9.3.tar.gz
2.安装配置
业务业务:将数据库中的user和user_detail表中的数据同步到ES
创建配置文件,用于读取mysql到es
cp config/logstash-sample.conf config/logstash-mysql2es.conf
修改 logstash-mysql2es.conf
input {
stdin {
}jdbc {
tags => ["user_tag"]
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3301/member?useUnicode=true&characterEncoding=utf-8"
jdbc_user => "root"
jdbc_password => "123456"
jdbc_driver_library => "/home/elk/logstash-7.9.3/mysql-transfer/mysql-connector-java-6.0.5.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => " select id,user_name, code, status, create_time,update_time from t_user where update_time >= :sql_last_value order by update_time,id asc"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
use_column_value => true
tracking_column => "update_time"
tracking_column_type => "timestamp"
record_last_run => true
last_run_metadata_path => "./logstash_last_id"
clean_run => false
schedule => "* * * * *"
}jdbc {
tags => ["user_detail_tag"]
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3301/member?useUnicode=true&characterEncoding=utf-8"
jdbc_user => "root"
jdbc_password => "123456"
jdbc_driver_library => "/home/elk/logstash-7.9.3/mysql-transfer/mysql-connector-java-6.0.5.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "select id,user_id, type,amount, status, remark,create_time,update_time from t_user_detail where update_time >= :sql_last_value order by update_time,id asc"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
use_column_value => true
tracking_column => "update_time"
tracking_column_type => "timestamp"
record_last_run => true
last_run_metadata_path => "./logstash_last_detail_id"
clean_run => false
schedule => "* * * * *"
}
}
output {
if "user_tag" in [tags] {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "user_index"
document_type => "_doc"
document_id => "%{id}"
}
}if "user_detail_tag" in [tags] {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "user_detail_index"
document_type => "_doc"
document_id => "%{id}"
}
}stdout {
codec => json_lines
}}
启动
bin/logstash -f config/logstash-mysql2es.conf
ps:如果数据库连接的时sharding-proxy的得知,需要去掉如下配置(不支持分页)
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
切记!切记!
三、canal配置
1.下载文件
下载canal:https://github.com/alibaba/canal/releases (下载deployer,adapter,admin)
https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.adapter-1.1.4.tar.gz
https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.admin-1.1.4.tar.gz
- canal-deployer(canal-server)
可以直接监听MySQL的binlog,把自己伪装成MySQL的从库,只负责接收数据,并不做处理。 - canal-adapter (canal-client)
相当于canal的客户端,会从canal-server中获取数据,然后对数据进行同步,可以同步到MySQL、Elasticsearch和HBase等存储中去。 - canal-admin
为canal提供整体配置管理、节点运维等面向运维的功能,提供相对友好的WebUI操作界面,方便更多用户快速和安全的操作。
2.安装配置
2.1 deployer部署
mkdir /usr/local/canal/deployer
tar -zxvf canal.deployer-1.1.4.tar.gz -C /usr/local/canal/deployer
cd /usr/local/canal/deployer
编辑 conf/example/instance.properties ,修改以下四项
canal.instance.master.address = 192.168.158.211:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canalcanal.instance.filter.regex=test_.*\\..*
启动
sh bin/startup.sh
2.2 adapter部署
mkdir /usr/local/canal/adapter
tar -zxvf canal.deployer-1.1.4.tar.gz -C /usr/local/canal/adapter
cd /usr/local/canal/adapter
修改conf/application.yml
server:
port: 8081
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
default-property-inclusion: non_nullcanal.conf:
mode: tcp #tcp kafka rocketMQ rabbitMQ
flatMessage: true
zookeeperHosts: #对应集群模式下的zk地址
syncBatchSize: 1000 #每次同步的批数量
retries: 0
timeout:
accessKey:
secretKey:
consumerProperties:
# canal tcp consumer
canal.tcp.server.host: 127.0.0.1:11111 #canal-server地址
canal.tcp.zookeeper.hosts:
canal.tcp.batch.size: 500
canal.tcp.username:
canal.tcp.password:srcDataSources: # 源数据库配置
defaultDS:
url: jdbc:mysql://127.0.0.1:3301/member?useUnicode=true
username: canal
password: canal
canalAdapters:
- instance: example # canal实例名或者MQ topic名
groups:
- groupId: g1 #分组id, 如果是MQ模式将用到该值
outerAdapters:
- name: logger
- name: es
hosts: 127.0.0.1:9200 #ES连接地址
properties:
mode: rest # 模式可选transport(9300) 或者 rest(9200)
cluster.name: elasticsearch #ES集群名称
添加业务配置
编辑 es7/user.yml
dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
_index: user_index ##es的索引
_id: id
sql: "select id,user_name, code, status, create_time,update_time from t_user"
etlCondition: "where create_time>={}"
commitBatch: 3000
编辑 es/user_detail.yml
dataSourceKey: defaultDS # 源数据源的key, 对应上面配置的srcDataSources中的值
destination: example # canal的instance或者MQ的topic
groupId: g1 # 对应MQ模式下的groupId, 只会同步对应groupId的数据
esMapping:
_index: user_detail_index # es 的索引名称
_id: id # es 的_id, 如果配置该项必须配置下面的pk项_id,否则会由es自动分配
sql: "select id,user_id, type,amount, status, remark,create_time,update_time from t_user_detail"
etlCondition: "where create_time>={}"
commitBatch: 3000
启动
sh bin/startup.sh