Flume抽取mysql数据到hive

Flume抽取MySQL数据到hive

1.环境说明

注:集群环境为CDH6.0
1.1. 采用Apache版本Flume
CDH6.0版本测试不支持 flume-ng-sql-source 抽取mysql数据库
注:mysql binlog 日志未开启(开启binlog会占用很多资源) 所以采用flume-ng-sql-source插件
项目地址:https://github.com/keedio/flume-ng-sql-source

1.2. 环境配置
Hive根目录
/opt/cloudera/parcels/CDH-6.0.0-1.cdh6.0.0.p0.537114/lib/hive-hcatalog/share/hcatalog
中的4个jar包导入flume_home/lib;
hive-hcatalog-core-2.3.0.jar
hive-hcatalog-ping-adapter-2.3.0.jar
hive-hcatalog-server-extensions-2.3.0.jar
hive-hcatalog-streaming-2.3.0.jar
hive_home/lib下的所有jar导入flume_home/lib中;

1.3. hive端建表
注:必须分桶 + ORC事务表

create table flume_user(
user_id string,user_name string,age string
)
clustered by (user_id) into 2 buckets
stored as orc
TBLPROPERTIES('transactional'='true')
#开启hive支持并发和事务
set hive.support.concurrency=true
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

2.脚本文件1

注:Netcat-Flume-Hive 方式

#netcat-flume-hive(端口发送数据 sink_hive 测试)
#flume-ng agent --conf-file sink_hive.conf -c conf/ --name a1 -Dflume.root.logger=DEBUG,console
a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=netcat
a1.sources.r1.bind=hadoop3
a1.sources.r1.port=44444

a1.sinks.s1.type=hive
a1.sinks.s1.hive.metastore=thrift://hadoop3:9083
a1.sinks.s1.hive.database=default
a1.sinks.s1.hive.table=flume_user
a1.sinks.s1.serializer=DELIMITED
#a1.sinks.s1.hive.partition=%Y-%m-%d
#a1.sinks.s1.autoCreatePartitions = false
#a1.sinks.s1.useLocalTimeStamp=false
a1.sinks.s1.serializer.delimiter="\t"
a1.sinks.s1.serializer.serdeSeparator='\t'
a1.sinks.s1.serializer.fieldnames=user_id,user_name,age

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

3.脚本文件2

注:mysql-Flume-hive 方式

#mysql-Flume-hive(测试)
#flume-ng agent --conf-file  sink_hive.conf -c conf/ --name a1 -Dflume.root.logger=DEBUG,console
a1.sources=sqlSource
a1.channels=c1
a1.sinks=s1

#声明source类型
a1.sources.sqlSource.type=org.keedio.flume.source.SQLSource
a1.sources.sqlSource.hibernate.connection.url=jdbc:mysql://xx-xx-xx-xx:3306/test?useSSL=false
a1.sources.sqlSource.hibernate.connection.user=xxxx
a1.sources.sqlSource.hibernate.connection.password=xxxx
#这个参数很重要,默认false,如果设为false就不会自动查询
a1.sources.sqlSource.hibernate.connection.autocommit=true
#声明mysql的hibernate方言
a1.sources.sqlSource.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
#声明mysql驱动
a1.sources.sqlSource.hibernate.connection.driver_class=com.mysql.jdbc.Driver
#查询间隔,单位毫秒
a1.sources.sqlSource.run.query.delay=10000
#声明保存flume状态的文件夹位置
a1.sources.sqlSource.status.file.path=/var/lib/flume 
a1.sources.sqlSource.status.file.name=test_mysql_hive.status
#声明从第一条数据开始查询
a1.sources.sqlSouce.start.from=0
a1.sources.s1.custom.query = select user_id,user_name,age from flume_hive_test  where user_id > $@$ order by user_id asc
#a1.sources.sqlSource.columns.to.select = *
#a1.sources.sqlSource.incremental.column.name = user_id
#a1.sources.sqlSource.incremental.value = 0

#设置分批参数
a1.sources.sqlSource.batch.size=1000
a1.sources.sqlSource.max.rows=1000

#设置c3p0连接池参数
a1.sources.sqlSource.hibernate.connection.provider_class=org.hibernate.connection.C3P0ConnectionProvider
a1.sources.sqlSource.hibernate.c3p0.min_size=1
a1.sources.sqlSource.hibernate.c3p0.max_size=10

a1.sinks.s1.type=hive
a1.sinks.s1.hive.metastore=thrift://hadoop3:9083
a1.sinks.s1.hive.database=default
a1.sinks.s1.hive.table=flume_user
a1.sinks.s1.serializer=DELIMITED
#a1.sinks.s1.hive.partition=%Y-%m-%d
#a1.sinks.s1.autoCreatePartitions = false
#a1.sinks.s1.useLocalTimeStamp=true
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.s1.serializer.delimiter="\t"
a1.sinks.s1.serializer.serdeSeparator='\t'
a1.sinks.s1.serializer.fieldnames=user_id,user_name,age

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sources.sqlSource.channels=c1
a1.sinks.s1.channel=c1```




  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 8
    评论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值