Flume采集MySQL写入HDFS

柯南学数据

已于 2022-08-23 16:25:55 修改

阅读量947

点赞数 1

文章标签： mysql flume hdfs

于 2022-08-22 18:08:57 首次发布

本文链接：https://blog.csdn.net/qq_65967263/article/details/126471141

版权

一、采集背景

1、flume默认不支持连接mysql，所以需要导入flume-ng-sql-source-1.5.3.jar和mysql-connector-java-5.1.37.jar

2、在flume这个jar包中我修改了flume-ng-core和我flume的版本一致，没有测试不一样会怎么样

二、配置文件

#声明source, channel, sink
a1.sources=sqlSource
a1.channels=c1
a1.sinks=s1

#声明source类型
a1.sources.sqlSource.type=org.keedio.flume.source.SQLSource
a1.sources.sqlSource.hibernate.connection.url=jdbc:mysql://127.0.0.0:3306/数据库名
a1.sources.sqlSource.hibernate.connection.user=root
a1.sources.sqlSource.hibernate.connection.password=******

#这个参数很重要，任务自动提交，默认为false,如果不设置为true,查询不会自动执行
a1.sources.sqlSource.hibernate.connection.autocommit=true
#声明mysql的hibernate方言
a1.sources.sqlSource.hibernate.dialect= org.hibernate.dialect.MySQL5Dialect
a1.sources.sqlSource.hibernate.connection.driver_calss=com.mysql.jdbc.Driver

#查询时间间隔
a1.sources.sqlSource.run.query.delay=10000

#声明保存flume状态的文件夹位置
a1.sources.sqlSource.status.file.path=/var/lib/flume
a1.sources.sqlSource.status.file.name=sql-Source.status

#声明查询开始位置
a1.sources.sqlSource.start.from=0

#sql语句自定义,但是要注意:增量只能针对id字段即主键列,经测试系统默认如此.
#而且必须要将主键查询出来,因为如果不查询主键,flume无法记录上一次查询的位置.
#$@$表示增量列上一次查询的值，记录在status文件中
a1.sources.sqlSource.custom.query=select * from tablename where id > $@$

#设置分配参数
a1.sources.sqlSource.batch.size=1000
a1.sources.sqlSource.max.rows=1000

#查询结果分隔符
a1.sources.sqlSource.delimiter.entty=,

#a1.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
#a1.sources.sqlSource.hibernate.c3p0.min_size=3
#a1.sources.sqlSource.hibernate.c3p0.max_size=10

#a1.sources.sqlSource.interceptors=i1
#a1.sources.sqlSource.interceptors.i1.type=search_replace
#a1.sources.sqlSource.interceptors.i1.searchPattern="
#a1.sources.sqlSource.interceptors.i1.replaceString=

## channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/module/flume-1.9.0/checkpoint/behavior1
a1.channels.c1.dataDirs = /opt/module/flume-1.9.0/data/behavior1
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 1000000
a1.channels.c1.keep-alive = 6

a1.sinks.s1.type=HDFS
#ns为namenode的命名空间,两个作用,一个是防止集群坍塌,另一个是改参数只能作用在active的namenode节点上
a1.sinks.s1.hdfs.path=/flume/mysql/***
a1.sinks.s1.hdfs.round = false
#设置滚动时间,每隔多少时间生成一个文件.如果设置成0,则禁止滚动,可以使所有数据被写到一个文件中.
a1.sinks.s1.hdfs.rollInterval= 10
#设置文件存储数据多大的时候生成下一个文件,建议设置成128M和块大小相同
a1.sinks.s1.hdfs.rollSize=134217728
#设置文件多少行时,滚动生成下一个文件,设置成0时禁止滚动
a1.sinks.s1.hdfs.rollCount=0
## 控制输出文件是原生文件。
#a1.sinks.s1.hdfs.fileType = CompressedStream
#a1.sinks.s1.hdfs.codeC = gzip
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text

a1.sources.sqlSource.channels=c1
a1.sinks.s1.channel=c1

三、一个配置文件采集多个mysql表：这部分还没有测试