flume 实时监控读写操作例程

Flume最主要的作用就是,实时监控读取服务器本地磁盘的数据,将数据写入到HDFS、kafka等。

在这里插入图片描述

在这里插入图片描述

输入vi flume-env.sh进入修改

配置java路径

export JAVA_HOME=/root/software/jdk1.8.0_221

配置flume的运行内存(建议10G)

export JAVA_OPTS="-Xms10240m -Xmx10240m -Dcom.sun.management.jmxremote"

配小了在运行大量运算时容易报channel不足错误

在conf文件夹下创建job文件夹,用来存放执行任务的配置文件

在flume文件夹下创建flumeLogs文件夹,用于存放读取的数据

例程1:手动传入数据

首先在flume/conf/job文件夹下,创建vi netcat-flume-logger.conf

a1.sources=r1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=44444

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type=logger

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

安装netcat
yum install -y nc

yum install telnet.* -y

客户端接收数据
nc -lk 55555

进入服务器端telnet localhost 55555
输入数据即可在客户端接收

启动监控文件夹命令

/root/software/flume/bin/flume-ng agent --name a1 --conf /root/software/flume/conf --conf-file /root/software/flume/conf/job/netcat-flume-logger.conf -Dflume.root.logger=INFO,console

可能会出现端口占用现象,这时需要杀掉占用端口的进程,或者重复新定义一个端口(在job配置文件中)
对于控制台推出不了,用CTRL+]键,这时会强制退到telnet命令界面下,再用quit退出就行了,百试百灵。

输入jps找到Application ,输入kill -9 38695 杀掉进程
在这里插入图片描述

例程2:将收到的数据记录到本地

首先在flume/conf/job文件夹下,创建vi file-flume-logger.conf

a1.sources=r1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=exec
a1.sources.r1.command=tail -f /root/software/flume/flumeLogs/flumedemo.log

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type=logger

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
手动使用命令写入到文件,服务端不停在扫描文件并输出
echo "hello code">> /root/software/flume/flumeLogs/flumedemo.log
例程3:使用正则表达式 来读取文件夹内文件

首先在flume/conf/job文件夹下,创建vi event-flume-logger.conf

a1.sources=eventsSource
a1.channels=eventsChannel
a1.sinks=eventsSink

a1.sources.eventsSource.type=spooldir
a1.sources.eventsSource.spoolDir=/root/test/event
a1.sources.eventsSource.deserializer=LINE
a1.sources.eventsSource.deserializer.maxLineLength=32000
a1.sources.eventsSource.includePattern=events_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv

a1.channels.eventsChannel.type=file
a1.channels.eventsChannel.checkpointDir=/root/software/flume/flumeLogs/checkpoint/events
a1.channels.eventsChannel.dataDirs=/root/software/flume/flumeLogs/data/events

a1.sinks.eventsSink.type=logger

a1.sources.eventsSource.channels=eventsChannel
a1.sinks.eventsSink.channel=eventsChannel

0-9匹配四次,0-9匹配两次都匹配上就选中文件
events_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv

spoolDir后面是待传输的文件所在的文件夹 传输完后此文件会改名 文件名.COMPLETE
checkpointDir这个要自己建立一个文件夹
dataDirs这个要自己建立一个文件夹

例程4:将读取完成的数据存到HDFS

首先在flume/conf/job文件夹下,创建vi user_friends-hdfs.conf

userFriends.sources=userfriendsSource
userFriends.channels=userfriendsChannel
userFriends.sinks=userfriendsSink

userFriends.sources.userfriendsSource.type=spooldir
#要读取的文件所在的地址
userFriends.sources.userfriendsSource.spoolDir=/root/test/event
userFriends.sources.userfriendsSource.deserializer=LINE
#设置每行读取的最大数值
userFriends.sources.userfriendsSource.deserializer.maxLineLength=32000
#匹配文件名
userFriends.sources.userfriendsSource.includePattern=userFriend_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv

userFriends.channels.userfriendsChannel.type=file
userFriends.channels.userfriendsChannel.checkpointDir=/root/software/flume/flumeLogs/checkpoint/userfriend
userFriends.channels.userfriendsChannel.dataDirs=/root/software/flume/flumeLogs/data/userfriend

userFriends.sinks.userfriendsSink.type=hdfs
#设置文件类型,可支持压缩
userFriends.sinks.userfriendsSink.hdfs.filetype=DataStream
#上传文件的前缀
userFriends.sinks.userfriendsSink.hdfs.filePrefix=userFriend
#上传文件的后缀
userFriends.sinks.userfriendsSink.hdfs.fileSuffix=.csv
userFriends.sinks.userfriendsSink.hdfs.path=hdfs://192.168.150.100:9000/kb11/userfriend/%Y-%m-%d
#是否使用本地时间戳
userFriends.sinks.userfriendsSink.hdfs.useLocalTimeStamp=true
#积攒多少个Event才flush到HDFS一次
userFriends.sinks.userfriendsSink.hdfs.batchSize=640
#文件的滚动与Event数量无关
userFriends.sinks.userfriendsSink.hdfs.rollCount=0
#设置每个文件的滚动大小大概是6M
userFriends.sinks.userfriendsSink.hdfs.rollSize=6400000
#多久生成一个新的文件
userFriends.sinks.userfriendsSink.hdfs.rollInterval=30

userFriends.sources.userfriendsSource.channels=userfriendsChannel
userFriends.sinks.userfriendsSink.channel=userfriendsChannel

例程5:将flume收集到的数据直接放到kafka

首先在flume/conf/job文件夹下,创建vi user_friends-kafka.conf

userfriends.sources=userfriendsSource
userfriends.channels=userfriendsChannel
userfriends.sinks=userfriendsSink

userfriends.sources.userfriendsSource.type=spooldir
userfriends.sources.userfriendsSource.spoolDir=/root/test/event
userfriends.sources.userfriendsSource.deserializer=LINE
userfriends.sources.userfriendsSource.deserializer.maxLineLength=64000
userfriends.sources.userfriendsSource.includePattern=userFriend_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
userfriends.sources.userfriendsSource.interceptors=head_filter
userfriends.sources.userfriendsSource.interceptors.head_filter.type=regex_filter
userfriends.sources.userfriendsSource.interceptors.head_filter.regex=^user*
userfriends.sources.userfriendsSource.interceptors.head_filter.excludeEvents=true

userfriends.channels.userfriendsChannel.type=file
userfriends.channels.userfriendsChannel.checkpointDir=/root/software/flume/flumeLogs/checkpoint/userfriend
userfriends.channels.userfriendsChannel.dataDirs=/root/software/flume/flumeLogs/data/userfriend

userfriends.sinks.userfriendsSink.type=org.apache.flume.sink.kafka.KafkaSink
userfriends.sinks.userfriendsSink.batchSize=640
userfriends.sinks.userfriendsSink.brokerList=192.168.150.100:9092
userfriends.sinks.userfriendsSink.topic=user_friends

userfriends.sources.userfriendsSource.channels=userfriendsChannel
userfriends.sinks.userfriendsSink.channel=userfriendsChannel

如果数值不对(过小),就把这个数值改大 maxLineLength
符合规则就移除掉

例程6:将flume收集到的数据同时放到kafka 和 hdfs

flume将读取到的数据用不同通道channel转存到hdfs和kafka

首先在flume/conf/job文件夹下,创建vi train-flume-kafkahdfs.conf

train.sources=trainSource
train.channels=kafkaChannel hdfsChannel
train.sinks=kafkaSink hdfsSink

train.sources.trainSource.type=spooldir
train.sources.trainSource.spoolDir=/root/test/event
train.sources.trainSource.deserializer=LINE
train.sources.trainSource.deserializer.maxLineLength=64000
train.sources.trainSource.includePattern=train_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
train.sources.trainSource.interceptors=head_filter
train.sources.trainSource.interceptors.head_filter.type=regex_filter
train.sources.trainSource.interceptors.head_filter.regex=^user*
train.sources.trainSource.interceptors.head_filter.excludeEvents=true

train.channels.kafkaChannel.type=file
train.channels.kafkaChannel.checkpointDir=/root/software/flume/flumeLogs/checkpoint/train
train.channels.kafkaChannel.dataDirs=/root/software/flume/flumeLogs/data/train

train.channels.hdfsChannel.type=memory
train.channels.hdfsChannel.capacity=64000
train.channels.hdfsChannel.transactionCapacity=16000


train.sinks.kafkaSink.type=org.apache.flume.sink.kafka.KafkaSink
train.sinks.kafkaSink.batchSize=640
train.sinks.kafkaSink.brokerList=192.168.150.100:9092
train.sinks.kafkaSink.topic=train

train.sinks.hdfsSink.type=hdfs
train.sinks.hdfsSink.hdfs.filetype=DataStream
train.sinks.hdfsSink.hdfs.filePrefix=train
train.sinks.hdfsSink.hdfs.fileSuffix=.csv
train.sinks.hdfsSink.hdfs.path=hdfs://192.168.150.100:9000/kb11/train/%Y-%m-%d
train.sinks.hdfsSink.hdfs.useLocalTimeStamp=true
train.sinks.hdfsSink.hdfs.batchSize=640
train.sinks.hdfsSink.hdfs.rollCount=0
train.sinks.hdfsSink.hdfs.rollSize=6400000
train.sinks.hdfsSink.hdfs.rollInterval=30


train.sources.trainSource.channels=kafkaChannel hdfsChannel
train.sinks.kafkaSink.channel=kafkaChannel
train.sinks.hdfsSink.channel=hdfsChannel

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值