(3)flume 单节点写入HDFS练习 以及 自定义拦截器 进行formatLog

(1)参考http://my.oschina.net/leejun2005/blog/288136#OSC_h2_10

(2)flume 用hdfs sink的时候需要用到hadoop的相关jar包。 使用cdh版本的会自带相关的jar包

(3)flume_directHDFS2.conf 

# Firstly, now that we've defined all of our components, tell agent1 which ones we want to activate.
agent1.sources = exec-source1
agent1.channels = ch1
agent1.sinks = log-sink1



##define -- Exec Source
#type       The component type name, needs to be exec  (required)
#shell      A shell invocation used to run the command 
#command    The command to execute  (required)
#channels   (required)

agent1.sources.exec-source1.type = exec
agent1.sources.exec-source1.shell = /bin/bash -c
agent1.sources.exec-source1.command = tail -n +0 -F /usr/local/nginx/logs/vdnlog_access.log
agent1.sources.exec-source1.channels = ch1


##define -- Memory Channel called ch1 on agent1
#type			The component type name, needs to be memory (required)
#capacity		The maximum number of events stored in the channel
#transactionCapacity	The maximum number of events the channel will take from a source or give to a sink per transaction
#keep-alive		Timeout in seconds for adding or removing an event
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30
 
# Define -- Hdfs Sink
#type			The component type name, needs to be hdfs  (required)
#channel		(required)
#hdfs.path		HDFS directory path (eg hdfs://namenode/flume/webdata/) (required)
#hdfs.writeFormat       Format for sequence file records. One of “Text” or “Writable” (the default).
#hdfs.fileType		File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC
#hdfs.filePrefix	Name prefixed to files created by Flume in hdfs directory
#hdfs.fileSuffix	Suffix to append to file (eg .avro - NOTE: period is not automatically added)
#hdfs.round		Should the timestamp be rounded down
#hdfs.roundValue	Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
#
#按照10分钟滚动一次  这三个参数 设置为0 不然不起作用。
#agent1.sinks.log-sink1.hdfs.rollInterval= 0
#agent1.sinks.log-sink1.hdfs.rollSize = 0
#agent1.sinks.log-sink1.hdfs.rollCount = 0
#
#此时日志带.tmp
#idleTimeout=5  Timeout after which inactive files get closed
###################
agent1.sinks.log-sink1.type = hdfs
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.hdfs.path = hdfs://101.240.151.41:9000/test/pjm/%y-%m-%d
agent1.sinks.log-sink1.hdfs.writeFormat = Text
agent1.sinks.log-sink1.hdfs.fileType = DataStream
agent1.sinks.log-sink1.hdfs.filePrefix = flume_%y-%m-%d_%H%M%S
agent1.sinks.log-sink1.hdfs.fileSuffix = .log
agent1.sinks.log-sink1.hdfs.round = true
agent1.sinks.log-sink1.hdfs.roundValue = 10
agent1.sinks.log-sink1.hdfs.roundUnit = minute
agent1.sinks.log-sink1.hdfs.rollInterval= 0
agent1.sinks.log-sink1.hdfs.rollSize = 0
agent1.sinks.log-sink1.hdfs.rollCount = 0
agent1.sinks.log-sink1.hdfs.useLocalTimeStamp = true
agent1.sinks.log-sink1.hdfs.callTimeout = 20000
agent1.sinks.log-sink1.hdfs.idleTimeout=5

(此处按照时间滚动文件   10分钟一个文件 )

 bin/flume-ng agent --conf conf --conf-file ./conf/flume_directHDFS2.conf --name agent1 -Dflume.root.logger=INFO,console

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

//自定义拦截器

# Firstly, now that we've defined all of our components, tell agent1 which ones we want to activate.
agent.sources = exec-source1
agent.channels = memchannellv memchannelerr memchannelbf memchannelfs memchannelother
agent.sinks = hdfssinklv hdfssinkerr hdfssinkbf hdfssinkfs hdfssinkother



##define -- Exec Source
#type       The component type name, needs to be exec  (required)
#shell      A shell invocation used to run the command 
#command    The command to execute  (required)
#channels   (required)

agent.sources.exec-source1.type = exec
agent.sources.exec-source1.shell = /bin/bash -c
agent.sources.exec-source1.command = tail -F /usr/local/nginx/logs/vdnlog_access.log
agent.sources.exec-source1.interceptors = timestamp nginxlogformat
agent.sources.exec-source1.interceptors.nginxlogformat.type = com.cntv.bigdata.flume.interceptor.NginxInterceptor$Builder
agent.sources.exec-source1.interceptors.timestamp.type = timestamp


##sources selector
agent.sources.exec-source1.selector.type = multiplexing
agent.sources.exec-source1.selector.header = type
agent.sources.exec-source1.selector.mapping.lv = memchannellv
agent.sources.exec-source1.selector.mapping.err = memchannelerr
agent.sources.exec-source1.selector.mapping.bf = memchannelbf
agent.sources.exec-source1.selector.mapping.fs = memchannelfs
agent.sources.exec-source1.selector.default = memchannelother
agent.sources.exec-source1.channels = memchannellv memchannelerr memchannelbf memchannelfs memchannelother



##define -- Memory Channel called ch1 on agent1
#type			The component type name, needs to be memory (required)
#capacity		The maximum number of events stored in the channel
#transactionCapacity	The maximum number of events the channel will take from a source or give to a sink per transaction
#keep-alive		Timeout in seconds for adding or removing an event


 
agent.channels.memchannellv.type = memory
agent.channels.memchannellv.capacity = 10000
agent.channels.memchannellv.transactionCapacity = 10000
agent.channels.memchannellv.keep-alive = 3

agent.channels.memchannelerr.type = memory
agent.channels.memchannelerr.capacity = 10000
agent.channels.memchannelerr.transactionCapacity = 10000
agent.channels.memchannelerr.keep-alive = 3


agent.channels.memchannelbf.type = memory
agent.channels.memchannelbf.capacity = 10000
agent.channels.memchannelbf.transactionCapacity = 10000
agent.channels.memchannelbf.keep-alive = 3

agent.channels.memchannelfs.type = memory
agent.channels.memchannelfs.capacity = 10000
agent.channels.memchannelfs.transactionCapacity = 10000
agent.channels.memchannelfs.keep-alive = 3


agent.channels.memchannelother.type = memory
agent.channels.memchannelother.capacity = 10000
agent.channels.memchannelother.transactionCapacity = 10000
agent.channels.memchannelother.keep-alive = 3



# Define -- Hdfs Sink
#type			The component type name, needs to be hdfs  (required)
#channel		(required)
#hdfs.path		HDFS directory path (eg hdfs://namenode/flume/webdata/) (required)
#hdfs.writeFormat       Format for sequence file records. One of “Text” or “Writable” (the default).
#hdfs.fileType		File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC
#hdfs.filePrefix	Name prefixed to files created by Flume in hdfs directory
#hdfs.fileSuffix	Suffix to append to file (eg .avro - NOTE: period is not automatically added)
#hdfs.round		Should the timestamp be rounded down
#hdfs.roundValue	Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
#
#按照10分钟滚动一次  这三个参数 设置为0 不然不起作用。
#agent1.sinks.log-sink1.hdfs.rollInterval= 0
#agent1.sinks.log-sink1.hdfs.rollSize = 0
#agent1.sinks.log-sink1.hdfs.rollCount = 0
#
#
####################

#######lv
agent.sinks.hdfssinklv.type = hdfs
agent.sinks.hdfssinklv.hdfs.fileType = DataStream
agent.sinks.hdfssinklv.hdfs.idleTimeout = 60
agent.sinks.hdfssinklv.hdfs.round = true
agent.sinks.hdfssinklv.hdfs.roundValue = 10
agent.sinks.hdfssinklv.hdfs.roundUnit = minute
agent.sinks.hdfssinklv.hdfs.rollInterval = 0
agent.sinks.hdfssinklv.hdfs.rollSize = 0
agent.sinks.hdfssinklv.hdfs.rollCount = 0
agent.sinks.hdfssinklv.hdfs.path = hdfs://10.240.15.4:9000/test/pjm/xxoo/lv/%y-%m-%d
agent.sinks.hdfssinklv.hdfs.filePrefix = flume_bjxd02Lv_%y-%m-%d_%H%M%S
agent.sinks.hdfssinklv.hdfs.fileSuffix = .log
agent.sinks.hdfssinklv.channel = memchannellv

#######err
agent.sinks.hdfssinkerr.type = hdfs
agent.sinks.hdfssinkerr.hdfs.fileType = DataStream
agent.sinks.hdfssinkerr.hdfs.idleTimeout = 60
agent.sinks.hdfssinkerr.hdfs.round = true
agent.sinks.hdfssinkerr.hdfs.roundValue = 10
agent.sinks.hdfssinkerr.hdfs.roundUnit = minute
agent.sinks.hdfssinkerr.hdfs.rollInterval = 0
agent.sinks.hdfssinkerr.hdfs.rollSize = 0
agent.sinks.hdfssinkerr.hdfs.rollCount = 0
agent.sinks.hdfssinkerr.hdfs.path = hdfs://101.240.151.41:9000/test/pjm/xxoo/err/%y-%m-%d
agent.sinks.hdfssinkerr.hdfs.filePrefix = flume_bjxd02Err_%y-%m-%d_%H%M%S
agent.sinks.hdfssinkerr.hdfs.fileSuffix = .log
agent.sinks.hdfssinkerr.channel = memchannelerr

#######bf
agent.sinks.hdfssinkbf.type = hdfs
agent.sinks.hdfssinkbf.hdfs.fileType = DataStream
agent.sinks.hdfssinkbf.hdfs.idleTimeout = 60
agent.sinks.hdfssinkbf.hdfs.round = true
agent.sinks.hdfssinkbf.hdfs.roundValue = 10
agent.sinks.hdfssinkbf.hdfs.roundUnit = minute
agent.sinks.hdfssinkbf.hdfs.rollInterval = 0
agent.sinks.hdfssinkbf.hdfs.rollSize = 0
agent.sinks.hdfssinkbf.hdfs.rollCount = 0
agent.sinks.hdfssinkbf.hdfs.path = hdfs://101.240.151.41:9000/test/pjm/xxoo/bf/%y-%m-%d
agent.sinks.hdfssinkbf.hdfs.filePrefix = flume_bjxd02Bf_%y-%m-%d_%H%M%S
agent.sinks.hdfssinkbf.hdfs.fileSuffix = .log
agent.sinks.hdfssinkbf.channel = memchannelbf


#######fs
agent.sinks.hdfssinkfs.type = hdfs
agent.sinks.hdfssinkfs.hdfs.fileType = DataStream
agent.sinks.hdfssinkfs.hdfs.idleTimeout = 60
agent.sinks.hdfssinkfs.hdfs.round = true
agent.sinks.hdfssinkfs.hdfs.roundValue = 10
agent.sinks.hdfssinkfs.hdfs.roundUnit = minute
agent.sinks.hdfssinkfs.hdfs.rollInterval = 0
agent.sinks.hdfssinkfs.hdfs.rollSize = 0
agent.sinks.hdfssinkfs.hdfs.rollCount = 0
agent.sinks.hdfssinkfs.hdfs.path = hdfs://101.240.151.41:9000/test/pjm/xxoo/fs/%y-%m-%d
agent.sinks.hdfssinkfs.hdfs.filePrefix = flume_bjxd02Fs_%y-%m-%d_%H%M%S
agent.sinks.hdfssinkfs.hdfs.fileSuffix = .log
agent.sinks.hdfssinkfs.channel = memchannelfs

#######other
agent.sinks.hdfssinkother.type = hdfs
agent.sinks.hdfssinkother.hdfs.fileType = DataStream
agent.sinks.hdfssinkother.hdfs.idleTimeout = 60
agent.sinks.hdfssinkother.hdfs.round = true
agent.sinks.hdfssinkother.hdfs.roundValue = 10
agent.sinks.hdfssinkother.hdfs.roundUnit = minute
agent.sinks.hdfssinkother.hdfs.rollInterval = 0
agent.sinks.hdfssinkother.hdfs.rollSize = 0
agent.sinks.hdfssinkother.hdfs.rollCount = 0
agent.sinks.hdfssinkother.hdfs.path = hdfs://101.2401.151.4:9000/test/pjm/xxoo/other/%y-%m-%d
agent.sinks.hdfssinkother.hdfs.filePrefix = flume_bjxd02Other_%y-%m-%d_%H%M%S
agent.sinks.hdfssinkother.hdfs.fileSuffix = .log
agent.sinks.hdfssinkother.channel = memchannelother

xxxx@localhost flume]$ ./bin/flume-ng agent --conf conf --conf-file ./conf/flume_directHDFS3.properties --name agent -Dflume.root.logger=DEBUG,console,LOGFILE





要使用Flume读取RabbitMQ并将数据写入HDFS,可以按照以下步骤进行: 1. 安装Flume和RabbitMQ。 2. 配置RabbitMQ,创建一个Exchange和一个Queue,将Exchange和Queue绑定在一起。 3. 在Flume的配置文件中,配置RabbitMQ Source和HDFS Sink。 示例配置文件如下: ``` agent.sources = rabbitmqSource agent.channels = memoryChannel agent.sinks = hdfsSink # 配置RabbitMQ Source agent.sources.rabbitmqSource.type = com.cloudera.flume.source.rabbitmq.RabbitMQSource agent.sources.rabbitmqSource.uri = amqp://<username>:<password>@<rabbitmq-host>:<rabbitmq-port>/ agent.sources.rabbitmqSource.exchange = <exchange-name> agent.sources.rabbitmqSource.queue = <queue-name> agent.sources.rabbitmqSource.batchSize = 100 # 配置Memory Channel agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 1000 # 配置HDFS Sink agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.hdfs.path = hdfs://<namenode>:<port>/<path> agent.sinks.hdfsSink.hdfs.fileType = DataStream agent.sinks.hdfsSink.hdfs.writeFormat = Text agent.sinks.hdfsSink.hdfs.rollInterval = 300 agent.sinks.hdfsSink.hdfs.rollSize = 0 agent.sinks.hdfsSink.hdfs.rollCount = 100 agent.sinks.hdfsSink.channel = memoryChannel ``` 其中,<username>、<password>、<rabbitmq-host>、<rabbitmq-port>、<exchange-name>、<queue-name>、<namenode>、<port>和<path>需要替换为实际的值。 4. 启动Flume Agent,使用以下命令启动: ``` $FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/flume.conf --name agent -Dflume.root.logger=INFO,console ``` 其中,$FLUME_HOME是Flume的安装目录。 5. Flume将开始从RabbitMQ读取数据并将其写入HDFS。 注意:在实际生产环境中,需要根据实际需求对Flume的配置进行优化和调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值