环境
flume使用流程
- 命名agent
- 命名agent.source
- 命令agent.channal
- 命名agent.sink
- 通过channal连接source和sink
命令格式
$ flume-ng agent -n $agent_name -c $FLUME_CONF_PATH -f conf/flume-conf.properties
下载flume-ng-sql-source
$ cd /var/tmp
$ wget https://github.com/keedio/flume-ng-sql-source/archive/v1.5.2.zip
$ unzip v1.5.2.zip
maven编译
$ cd flume-ng-sql-source-1.5.2
$ mvn package
编译完会生成一个target目录
flume 新建plugin.d目录
$ mkdir -p /usr/local/flume/plugins.d/sql-source/lib /usr/local/flume/plugins.d/sql-source/libext
$ cp target/flume-ng-sql-source-1.5.2.jar /usr/local/flume/plugins.d/sql-source/lib
下载JDBC
$ wget https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
$ tar -zxf mysql-connector-java-5.1.46.tar.gz
$ cd mysql-connector-java-5.1.46
//# 解压,复制
cp mysql-connector-java-5.1.46-bin.jar /usr/local/flume/plugins.d/sql-source/libext
建立HDFS目标目录
$ sudo -u hdfs hadoop fs -mkdir -p /flume/mysql
MYSQL添加数据库和数据
> CREATE DATABASE chenzl;
> USE chenzl;
> CREATE TABLE users (
id serial NOT NULL PRIMARY KEY,
name varchar(100),
email varchar(200),
department varchar(200),
modified timestamp default CURRENT_TIMESTAMP NOT NULL,
INDEX `modified_index` (`modified`)
);
> INSERT INTO users (name, email, department) VALUES ('alice', 'alice@abc.com', 'engineering');
> INSERT INTO users (name, email, department) VALUES ('bob', 'bob@abc.com', 'sales');
配置flume
从mysql读数据,写入hdfs
$ cd /var/tmp
$ vi example.conf
# Define
a1.sources = mysql-source
a1.sinks = hdfs-sink
a1.channels = mem-ch
# Describe/configure the source
a1.sources.mysql-source.type = org.keedio.flume.source.SQLSource
a1.sources.mysql-source.hibernate.connection.url = jdbc:mysql://vps126:63751/chenzl
a1.sources.mysql-source.hibernate.connection.user = chenzl
a1.sources.mysql-source.hibernate.connection.password = chenzl
a1.sources.mysql-source.table = users
a1.sources.mysql-source.columns.to.select = *
a1.sources.mysql-source.run.query.delay=5000
a1.sources.mysql-source.status.file.path = /tmp/flume
a1.sources.mysql-source.status.file.name = mysql-source.status
# Describe the sink
a1.sinks.hdfs-sink.type = hdfs
a1.sinks.hdfs-sink.hdfs.path = hdfs://vps125:8020/flume/mysql
a1.sinks.hdfs-sink.hdfs.fileType = DataStream
a1.sinks.hdfs-sink.writeFormat = Text
a1.sinks.hdfs-sink.rollSize = 268435456
a1.sinks.hdfs-sink.rollInterval = 0
a1.sinks.hdfs-sink.rollCount = 0
# Use a channel which buffers events in memory
a1.channels.mem-ch.type = memory
a1.channels.mem-ch.capacity = 1000
a1.channels.mem-ch.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.mysql-source.channels = mem-ch
a1.sinks.hdfs-sink.channel = mem-ch
创建source状态目录,记录偏移量
$ mkdir -p /tmp/flume
运行命令
$ cd /var/tmp
$ flume-ng agent --conf /usr/local/flume/conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
查看hdfs
$ sudo -u hdfs hadoop fs -ls /flume/mysql
/flume/mysql/FlumeData.1568801460335
$ sudo -u hdfs hadoop fs -cat /flume/mysql/FlumeData.1568801460335
"1","alice","alice@abc.com","engineering","2019-09-17 15:54:31.0"
"2","bob","bob@abc.com","sales","2019-09-17 15:54:32.0"
"3","chenzl","chenzl@abc.com","technology","2019-09-17 17:55:36.0"
查看状态
$ cat /tmp/flume/mysql-source.status
{"LastIndex":"3"}
MYSQL插入数据
> INSERT INTO users (name, email, department) VALUES ('chenzl1', 'chenzl1@abc.com', 'technology');
查看hdfs
$ sudo -u hdfs hadoop fs -ls /flume/mysql
/flume/mysql/FlumeData.1568801460335
/flume/mysql/FlumeData.1568801726927
会新生成一个文件
缺点
- 新插入的数据会生成新的文件
- MySQL的更新,删除无法同步
flume-ng-sql-source可以自定义查询语句;