Flume安装

安装前准备

  • Flume运行在linux环境下
  • Flume是java语言编写的,需要jdk 1.6及以上,推荐jdk 1.7

Flume安装

  • 下载Flume(以flume-ng-1.5.0-cdh5.3.6为例),下载地址:
    http://archive.cloudera.com/cdh5/cdh/5/
  • 上传解压
    tar -xzf flume-ng-1.5.0-cdh5.3.6-src.tar.gz
  • 配置文件
    配置java环境(修改flume-env.sh.template)
    mv flume-env.sh.template flume-env.sh
    vim flume-env.sh
    修改内容如下:
    export JAVA_HOME=/home/hadoop/package/jdk1.7.0_67
  • starting agent
    agent的配置文件存放在本地,是java属性格式文件

flume简单用例

flume只有一个角色 agent,agent由source、channel、sink三个部分组成,这三个部分的定义在agent的配置文件中完成。agent的配置文件在本地conf目录中配置,它是一个java属性格式的文件。

用例说明:使用flume提供的EXEC Source来实时监控一个文件(/home/web/aa),当文件有数据产生时,实时的将数据写入hdfs中(HDFS Sink)。agent的配置文件(test-conf.properties)如下:

# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

a1.sources = r1
a1.channels = c1
a1.sinks = k1

##define source (exec source)
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/web/aa
a1.sources.r1.shell = /bin/bash -c

##define channel (memory channel)
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

##define sink (hdfs sink)
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://CDH-cluster-main/tmp/test_log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 10

##define souce、sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

运行命令:
bin/flume-ng agent --conf conf --name a1 --conf-file conf/test-conf.properties -Dflume.root.logger=INFO,console
-Dflume.root.logger=INFO,console 将flume的日志输出到终端,用于测试时调试,正式运行可以不加

测试中报错:
(一)

22 Jul 2016 23:00:31,175 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
        at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:251)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
        at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

原因是缺乏jar包

cp $HADOOP_HOME/share/hadoop/common/hadoop-common-2.5.2.jar $FLUME_HOME/lib/

cp $HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar $FLUME_HOME/lib/

cp $HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.5.2.jar $FLUME_HOME/lib/

cp $HADOOP_HOME/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/hadoop-hdfs-2.5.2.jar $FLUME_HOME/lib/

(二)

2016-07-23 00:01:08,091 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:471)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:388)
        ... 3 more

原因是因为写入到hfds时使用到了时间戳来区分目录结构,flume的消息组件event在接受到之后在header中没有发现时间戳参数,导致该错误发生,有三种方法可以解决这个错误;
1、a1.sources.s1.interceptors = t1
a1.sources.s1.interceptors.t1.type = timestamp 为source添加拦截,每条event头中加入时间戳;(效率会慢一些)
2、agent1.sinks.sink1.hdfs.useLocalTimeStamp = true 为sink指定该参数为true (如果客户端和flume集群时间不一致数据时间会不准确)
3、在向source发送event时,将时间戳参数添加到event的header中即可,header是一个map,添加时mapkey为timestamp(推荐使用)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值