flume 从mysql读数据,写入hdfs

环境
flume使用流程
  • 命名agent
  • 命名agent.source
  • 命令agent.channal
  • 命名agent.sink
  • 通过channal连接source和sink
命令格式
$ flume-ng agent -n $agent_name -c $FLUME_CONF_PATH -f conf/flume-conf.properties
下载flume-ng-sql-source
$ cd /var/tmp
$ wget https://github.com/keedio/flume-ng-sql-source/archive/v1.5.2.zip
$ unzip v1.5.2.zip
maven编译
$ cd flume-ng-sql-source-1.5.2
$ mvn package

编译完会生成一个target目录

flume 新建plugin.d目录
$ mkdir -p /usr/local/flume/plugins.d/sql-source/lib /usr/local/flume/plugins.d/sql-source/libext
$ cp target/flume-ng-sql-source-1.5.2.jar /usr/local/flume/plugins.d/sql-source/lib
下载JDBC
$ wget https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
$ tar -zxf mysql-connector-java-5.1.46.tar.gz
$ cd mysql-connector-java-5.1.46

//# 解压,复制
cp mysql-connector-java-5.1.46-bin.jar /usr/local/flume/plugins.d/sql-source/libext
建立HDFS目标目录
$ sudo -u hdfs hadoop fs -mkdir -p /flume/mysql
MYSQL添加数据库和数据
> CREATE DATABASE chenzl;
> USE chenzl;
> CREATE TABLE users (
   id serial NOT NULL PRIMARY KEY,
   name varchar(100),
   email varchar(200),
   department varchar(200),
   modified timestamp default CURRENT_TIMESTAMP NOT NULL,
   INDEX `modified_index` (`modified`)
 );
> INSERT INTO users (name, email, department) VALUES ('alice', 'alice@abc.com', 'engineering');
> INSERT INTO users (name, email, department) VALUES ('bob', 'bob@abc.com', 'sales');
配置flume

从mysql读数据,写入hdfs

$ cd /var/tmp
$ vi example.conf
# Define
a1.sources = mysql-source
a1.sinks = hdfs-sink
a1.channels = mem-ch

# Describe/configure the source
a1.sources.mysql-source.type = org.keedio.flume.source.SQLSource
a1.sources.mysql-source.hibernate.connection.url = jdbc:mysql://vps126:63751/chenzl
a1.sources.mysql-source.hibernate.connection.user = chenzl
a1.sources.mysql-source.hibernate.connection.password = chenzl
a1.sources.mysql-source.table = users
a1.sources.mysql-source.columns.to.select = *
a1.sources.mysql-source.run.query.delay=5000
a1.sources.mysql-source.status.file.path = /tmp/flume
a1.sources.mysql-source.status.file.name = mysql-source.status

# Describe the sink
a1.sinks.hdfs-sink.type = hdfs
a1.sinks.hdfs-sink.hdfs.path = hdfs://vps125:8020/flume/mysql
a1.sinks.hdfs-sink.hdfs.fileType = DataStream
a1.sinks.hdfs-sink.writeFormat = Text
a1.sinks.hdfs-sink.rollSize = 268435456
a1.sinks.hdfs-sink.rollInterval = 0
a1.sinks.hdfs-sink.rollCount = 0

# Use a channel which buffers events in memory
a1.channels.mem-ch.type = memory
a1.channels.mem-ch.capacity = 1000
a1.channels.mem-ch.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.mysql-source.channels = mem-ch
a1.sinks.hdfs-sink.channel = mem-ch
创建source状态目录,记录偏移量
$ mkdir -p /tmp/flume
运行命令
$ cd /var/tmp
$ flume-ng agent --conf /usr/local/flume/conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
查看hdfs
$ sudo -u hdfs hadoop fs -ls /flume/mysql
/flume/mysql/FlumeData.1568801460335

$ sudo -u hdfs hadoop fs -cat /flume/mysql/FlumeData.1568801460335
"1","alice","alice@abc.com","engineering","2019-09-17 15:54:31.0"
"2","bob","bob@abc.com","sales","2019-09-17 15:54:32.0"
"3","chenzl","chenzl@abc.com","technology","2019-09-17 17:55:36.0"
查看状态
$ cat /tmp/flume/mysql-source.status
{"LastIndex":"3"}
MYSQL插入数据
> INSERT INTO users (name, email, department) VALUES ('chenzl1', 'chenzl1@abc.com', 'technology');
查看hdfs
$ sudo -u hdfs hadoop fs -ls /flume/mysql
/flume/mysql/FlumeData.1568801460335
/flume/mysql/FlumeData.1568801726927

会新生成一个文件

缺点
  • 新插入的数据会生成新的文件
  • MySQL的更新,删除无法同步

flume-ng-sql-source可以自定义查询语句;

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1. 使用Flume采集MySQL数据MySQL中建立数据库school,在数据库中建立表student。SQL语句如下: create database school; use school; create table student( id int not null, name varchar(40) , age int, grade int, primary key id ); 使用Flume实时捕捉MySQL数据库中的记录更新,一旦有新的记录生成,就捕获该记录并显示到控制台。可以使用如下SQL语句模拟MySQL数据库中的记录生成操作: insert into student(id , name , age , grade) values(1, 'Xiaoming' , 23 ,98); insert into student(id , name, age , grade) values(2, 'Zhangsan' , 24 ,96); insert into student(id , name, age , grade) values(3, 'Lisi' , 24 ,93); insert into student(id , name, age , grade) values(4, 'Wangwu' , 21 ,91); insert into student(id , name, age , grade) values(5, 'Weiliu' , 21 ,91); 2.Flume和Kafka的组合使用 编写Flume配置文件,将Kafka作为输入源,由生产者输入"HelloFlume"或其他信息;通过Flume将Kafka生产者输入的信息存入HDFS,存储格式为hdfs://localhost:9000/fromkafka/%Y%m%d/,要求存储时文件名为kafka_log(注:配置好Flume后生产者输入的信息不会实时写入HDFS,而是一段时间后批量写入)。 3.使用Flume写入当前文件系统 假设有一个目录"~/mylog/",现在新建两个文本文件l.txt与2.txt,在l.txt中输入"Hello Flume",在2.txt中输入"hello flume"。使用Flume对目录"~/mylog/"进行监控,当把l.txt与2.txt放入该目录时,Flume就会把文件内容写入"~/backup"目录下的文件中(注:配置文件中Source的类型为spooldir, Sink的类型为具体用法可以参考Apache官网文档。写一份实验思路
最新发布
07-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值