flume监控文件send到kafka

由于需求的变更需要实时传输数据,然后就要将之前的hdfs换成kafka

直接上配置


a1.sources = r1 r2 r4
a1.sinks = k1 k2 k4
a1.channels = c1 c2 c4



a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/mjxt/flume_data/001/
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.channels.c1.type = memory
a1.channels.c1.capacity = 200000
a1.channels.c1.transactionCapacity = 200000

a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
# 因为没有kafka集群 所以随便指定得 测试
a1.sinks.k1.kafka.topic = etc_mj_001
a1.sinks.k1.kafka.bootstrap.servers = 10.42.3.56:9092,10.42.3.55:9092,10.42.3.53:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = -1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy


a1.sources.r2.type = spooldir
a1.sources.r2.spoolDir = /home/mjxt/flume_data/002/
a1.sources.r2.interceptors = i2
a1.sources.r2.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.channels.c2.type = memory
a1.channels.c2.capacity = 200000
a1.channels.c2.transactionCapacity = 200000

a1.sinks.k2.channel = c2
a1.sinks.k2.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k2.kafka.topic = etc_mj_002
a1.sinks.k2.kafka.bootstrap.servers = 10.42.3.56:9092,10.42.3.55:9092,10.42.3.53:9092
a1.sinks.k2.kafka.flumeBatchSize = 20
a1.sinks.k2.kafka.producer.acks = -1
a1.sinks.k2.kafka.producer.linger.ms = 1
a1.sinks.k2.kafka.producer.compression.type = snappy



a1.sources.r4.type = spooldir
a1.sources.r4.spoolDir = /home/mjxt/flume_data/004/
a1.sources.r4.interceptors = i4
a1.sources.r4.interceptors.i4.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.channels.c4.type = memory
a1.channels.c4.capacity = 200000
a1.channels.c4.transactionCapacity = 200000

a1.sinks.k4.channel = c4
a1.sinks.k4.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k4.kafka.topic = etc_mj004
a1.sinks.k4.kafka.bootstrap.servers = 10.42.3.56:9092,10.42.3.55:9092,10.42.3.53:9092
a1.sinks.k4.kafka.flumeBatchSize = 20
a1.sinks.k4.kafka.producer.acks = -1
a1.sinks.k4.kafka.producer.linger.ms = 1
a1.sinks.k4.kafka.producer.compression.type = snappy



a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sources.r2.channels = c2
a1.sinks.k2.channel = c2
a1.sources.r4.channels = c4
a1.sinks.k4.channel = c4

因为数据不是特别重要,所以这里kafka也没有什么优化,正常使用就行了,source用的spoolDir的source,传输速度是最快的,但是也容易丢失数据。

之后通过脚本可以进行flume的重启,由于之前是用的hdfs的sink,所以需要先停止之前的,将app换个参数就行了。

#!/bin/bash

app="spool-kafka"
log_path="/home/mjxt/shell/heartbeat.log"

#检测方法
checkStatus(){
  pid=$(ps -ef |grep $app |grep -v "grep" |awk '{print $2}');
  #datetime=`date +%Y-%m-%d,%H:%m:%s`
  datetime="`date`"
  if [ -z "${pid}" ]; then
     echo "$datetime ---- 开始启动服务$APP_NAME" >> $log_path
      /home/mjxt/apache-flume-1.9.0-bin/bin/flume-ng agent -n a1 -c /home/mjxt/apache-flume-1.9.0-bin/conf -f /home/mjxt/apache-flume-1.9.0-bin/conf/spool-hdfs.conf -Dflume.root.logger=INFO,console >/dev/null  2>&1 &
     
  else
     echo "$datetime ---- 项目$APP_NAME已经启动,进程pid是${pid}!" >> $log_path
  fi
}

restart(){
  #pid=$(ps -ef |grep $app |grep -v "grep" |awk '{print $2}');
  process=`ps -ef|grep spool-kafka.conf |grep -v grep|grep -v PPID|awk '{print $2}'`
  for i in $process
  do
    echo "kill the process [$i]"
    kill -9 $i
  done
  
  cd /home/mjxt/apache-flume-1.9.0-bin/;bin/flume-ng agent -n a1 -c /home/mjxt/apache-flume-1.9.0-bin/conf -f /home/mjxt/apache-flume-1.9.0-bin/conf/spool-kafka.conf  >/dev/null  2>&1 &
  #datetime=`date +%Y-%m-%d,%H:%m:%s`
  #datetime="`date`"
}

restart

有两个优化问题就是,如果采用的channel是menory类型的

a1.channels.c4.type = memory
a1.channels.c4.capacity = 200000
a1.channels.c4.transactionCapacity = 200000

很容易就oom了,建议把下面两个参数调大点,第二个的数值不能超过第三个,之后就是flume-ng的配置优化

 打开flume-ng文件,这个JAVA_OPTS默认是20m,建议加大内存,这个也是导致oom的原因之一。

然后如果需要实时将数据移动到指定的flume监控目录,可以参考我的上一篇文章https://blog.csdn.net/mianhuatang__/article/details/125766761?spm=1001.2014.3001.5502

这两个可以结合使用,效果还行。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值