flume采集本地数据到hdfs

原创 2016年08月30日 19:34:20

配置:

agent1.sources = spooldirSource
agent1.channels = fileChannel
agent1.sinks = hdfsSink

agent1.sources.spooldirSource.type=spooldir
agent1.sources.spooldirSource.spoolDir=/opt/flume
agent1.sources.spooldirSource.channels=fileChannel

agent1.sinks.hdfsSink.type=hdfs
agent1.sinks.hdfsSink.hdfs.path=hdfs://192.168.200.45:8020/flume/cys/%y-%m-%d
agent1.sinks.hdfsSink.hdfs.filePrefix=cys
agent1.sinks.sink1.hdfs.round = true
# Number of seconds to wait before rolling current file (0 = never roll based on time interval)
agent1.sinks.hdfsSink.hdfs.rollInterval = 3600
# File size to trigger roll, in bytes (0: never roll based on file size)
agent1.sinks.hdfsSink.hdfs.rollSize = 128000000
agent1.sinks.hdfsSink.hdfs.rollCount = 0
agent1.sinks.hdfsSink.hdfs.batchSize = 1000

#Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
agent1.sinks.hdfsSink.hdfs.roundValue = 1
agent1.sinks.hdfsSink.hdfs.roundUnit = minute
agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfsSink.channel=fileChannel
agent1.sinks.hdfsSink.hdfs.fileType = DataStream


agent1.channels.fileChannel.type = file
agent1.channels.fileChannel.checkpointDir=/usr/share/apache-flume-1.5.0-bin/checkpoint
agent1.channels.fileChannel.dataDirs=/usr/share/apache-flume-1.5.0-bin/dataDir


执行:

[root@sdzn-cdh01 conf.dist]# flume-ng agent -f test1   -n agent1 -Dflume.root.logger=INFO,console


异常:

HDFSEventSink.java:463)] HDFS IO error
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "sdzn-cdh01.zhiyoubao.com/192.168.200.45"; destination host is: "sdzn-cdh01.zhiyoubao.com":9000;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1415)
        at org.apache.hadoop.ipc.Client.call(Client.java:1364)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at $Proxy19.create(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:287)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at $Proxy20.create(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1645)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1618)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1543)
        at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:396)
        at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:392)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:392)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:336)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775)
        at org.apache.flume.sink.hdfs.HDFSDataStream.doOpen(HDFSDataStream.java:86)
        at org.apache.flume.sink.hdfs.HDFSDataStream.open(HDFSDataStream.java:113)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:273)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:262)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:706)
        at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
        at org.apache.flume.sink.hdfs.BucketWriter.access$1400(BucketWriter.java:59)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:703)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)^C



少jar包如图:

jar放入:

[root@sdzn-cdh01 jars]# pwd
/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/jars


异常二:


  1. 1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:256)] FATAL: Spool Directory source r1: { spoolDir: /home/hadoop/flumeSpool-2 }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.  
  2. java.nio.charset.MalformedInputException: Input length = 1  
  3.     at java.nio.charset.CoderResult.throwException(CoderResult.java:281)  
  4.     at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:195)

数据格式有问题,修改数据格式即可!

解决连接:http://www.cnblogs.com/zhoujingyu/p/5315403.html


版权声明:本文为博主原创文章,欢迎诸位分享交流 https://blog.csdn.net/xiaoshunzi111/article/details/52372208

log4j+flume+HDFS实现日志存储

log4j  日志生成flume  日志收集系统,收集日志,使用版本apache-flume-1.6.0-bin.tar.gz .HDFS  Hadoop分布式文件系统,存储日志,使用版本had...
  • sum__mer
  • sum__mer
  • 2016-09-08 21:12:16
  • 2930

模拟使用Flume监听日志变化,并且把增量的日志文件写入到hdfs中

1.采集日志文件时一个很常见的现象采集需求:比如业务系统使用log4j生成日志,日志内容不断增加,需要把追加到日志文件中的数据实时采集到hdfs中。 1.1.根据需求,首先定义一下3大要素:采集源,即...
  • toto1297488504
  • toto1297488504
  • 2017-06-13 22:32:39
  • 2661

flume的导日志数据到hdfs

1.更改配置文件 #agent名, source、channel、sink的名称 a1.sources = r1 a1.channels = c1 a1.sinks = k1 #具体定义so...
  • qq_26398033
  • qq_26398033
  • 2017-01-19 15:30:39
  • 959

flume安装配置-采集日志到hadoop存储

一、整体架构        flume其实就是一个日志采集agent,在每台应用服务器安装一个flume agent,然后事实采集日志到HDFS集群环境存储,以便后续使用hive或者pig等大数据...
  • liangjianyong007
  • liangjianyong007
  • 2016-10-29 14:50:30
  • 3408

flume-ng+Hadoop实现日志收集

1.概述 flume是cloudera公司的一款高性能、高可能的分布式日志收集系统。 flume的核心是把数据从数据源收集过来,再送到目的地。为了保证输送一定成功,在送到目的地之前,会先缓存数据,...
  • e_wsq
  • e_wsq
  • 2017-09-26 18:43:23
  • 225

Flume中的HDFS Sink配置

Flume中的HDFS Sink配置参数说明 type:hdfs path:hdfs的路径,需要包含文件系统标识,比如:hdfs://namenode/flume/webdata/ fi...
  • u012689336
  • u012689336
  • 2016-09-30 16:22:04
  • 1364

Flume整合HDFS

Flume整合到HDFS
  • daiyutage
  • daiyutage
  • 2016-08-05 15:45:14
  • 454

Flume的体系结构介绍以及Flume入门案例(往HDFS上传数据)

# Flume的体系结构 对java有兴趣的朋友可以加上面说的553175249这个群哦,一起学习,共同进步 . # Flume介绍 Flume是Cloudera提...
  • qq_26418435
  • qq_26418435
  • 2016-06-07 14:54:57
  • 855

flume整合kafka和hdfs

flume版本:1.7.0 kafka版本:2.11-0.10.1.0 hadoop 版本:2.6.0 最近在玩这个flume和kafka这两个东西,网上有很多这方面的简介,我就不多说了,我的理解为啥...
  • whoami_zy
  • whoami_zy
  • 2016-12-20 13:26:16
  • 2445

flume学习(三):flume将log4j日志数据写入到hdfs

在第一篇文章中我们是将log4j
  • xiao_jun_0820
  • xiao_jun_0820
  • 2014-07-25 10:59:30
  • 17401
收藏助手
不良信息举报
您举报文章:flume采集本地数据到hdfs
举报原因:
原因补充:

(最多只允许输入30个字)