FLUME02

一、重温

1、source

2、channel

3、sink

二、Flume

采集指定文件的内容,带分区的传到HDFS

1、按照设置的roll时间来生成.tmp文件

2、round

是否使用时间戳来处理。由于我们想使用时间来创建文件,因此这里要选择true

3、roundValue and roundUnit

1秒钟,按需求


conf文件exec-memory-hdfs-partition.conf:

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /opt/data/data.log

 

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://hadoop002:9000/data/flume/page_views/%Y%m%d%H%M

a1.sinks.k1.hdfs.batchSize = 10

a1.sinks.k1.hdfs.fileType = DataStream 

a1.sinks.k1.hdfs.writeFormat = Text

a1.sinks.k1.hdfs.rollInterval = 0

a1.sinks.k1.hdfs.rollSize = 10485760

a1.sinks.k1.hdfs.rollCount = 0

a1.sinks.k1.hdfs.filePrefix = page-views

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 1

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.useLocalTimeStamp=true

 

a1.channels.c1.type = memory

a1.channels.c1.capacity = 100000

a1.channels.c1.transactionCapacity = 10000

 

a1.sinks.k1.channel = c1

a1.sources.r1.channels = c1


命令:

./flume-ng agent \

--name a1 \

--conf $FLUME_HOME/conf \

--conf-file /home/hadoop/script/flume/exec-memory-hdfs-partition.conf \

-Dflume.root.logger=INFO,console \

-Dflume.monitoring.type=http \

-Dflume.monitoring.port=34343

二、多Agent服务

1、一个Agent的输出作为另一个Agent的输入

2、多个Agent汇聚到一个Agent的Source,再从这个Agent将数据写到HDFS

从那个地址里面取数据

数据写入哪个地址里

3、为什么需要多层Agent汇聚(思考)

4、一个Source对应多个sinks、channels

client --> source -------------------------------> channel ----------------------------->sink

                          Flume Channel Selectors                 Flume Sink Processors

1)FlumeChannelSelectors的例子


Replicating Channel Selector:

a1.sources = r1
a1.channels = c1 c2 c3
a1.source.r1.selector.type = replicating
a1.source.r1.channels = c1 c2 c3
a1.source.r1.selector.optional = c3

Multiplexing Channel Selector:

a1.sources = r1
a1.channels = c1 c2 c3 c4
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state  Event的header的名字为state,如果state的数据是哪个c,就设置为那个c
a1.sources.r1.selector.mapping.CZ = c1
a1.sources.r1.selector.mapping.US = c2 c3
a1.sources.r1.selector.default = c4

2)Flume Sink Processors例子

failover  :

load_balance:负载均衡


Failover Sink Processor

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10    (数字越大,优先级越高)
a1.sinkgroups.g1.processor.maxpenalty = 10000

四、功能需求

需求:2个机器,把数据通过agent1到agent2输出到控制台

agent1:exec-memory-avro sink            arvo-sink.conf

agent2:avro source - memory - logger    avro-source.conf


avro-sink.conf:

avro-sink-agent.sources = exec-source

avro-sink-agent.sinks = avro-sink 

avro-sink-agent.channels = avro-memory-channel

 

avro-sink-agent.sources.exec-source.type = exec

avro-sink-agent.sources.exec-source.command = tail -F /opt/data/avro_access.log

avro-sink-agent.sources.exec-source.channels = avro-memory-channel

 

avro-sink-agent.channels.avro-memory-channel.type = memory

 

avro-sink-agent.sinks.avro-sink.type = avro

avro-sink-agent.sinks.avro-sink.channel = avro-memory-channel

avro-sink-agent.sinks.avro-sink.hostname = 0.0.0.0

avro-sink-agent.sinks.avro-sink.port = 44444


avro-source.conf:

avro-source-agent.sources = avro-source

avro-source-agent.sinks = logger-sink

avro-source-agent.channels = avro-memory-channel

 

avro-source-agent.sources.avro-source.type = avro

avro-source-agent.sources.avro-source.channels = avro-memory-channel

avro-source-agent.sources.avro-source.bind = 0.0.0.0

avro-source-agent.sources.avro-source.port = 44444

 

 

avro-source-agent.channels.avro-memory-channel.type = memory

 

avro-source-agen.sinks.logger-sink.type = logger

avro-source-agen.sinks.logger-sink.channel = avro-memory-channel


命令1:

flume-ng agent \

--name avro-sink-agent \

--conf $FLUME_HOME/conf \

--conf-file /opt/script/flume/avro-sink.conf \

-Dflume.root.logger=INFO,console \

-Dflume.monitoring.type=http \

-Dflume.monitoring.port=34344

 

命令2:

flume-ng agent \

--name avro-source-agent \

--conf $FLUME_HOME/conf \

--conf-file /opt/script/flume/avro-source.conf \

-Dflume.root.logger=INFO,console \

-Dflume.monitoring.type=http \

-Dflume.monitoring.port=34343 

总结:

1)先2后1

2)两个agent的端口不一样


需求:

IDEA搭建JavaWeb服务

1、IDEA在SCALA项目中建一个Directory,并Mark Directory as RourceRoot

2、建立java类

3、打印index日志


package com.ruozedata.flume;
import org.apache.log4j.Logger; //这里使用apache.log4j.Logger的方法,因为后面要通过修改log4j.propertites配置来达到将数据传输到Linux上的功能
public class LoggerGenerator {
private static Logger logger = Logger.getLogger(LoggerGenerator.class.getName());
public static void main(String[] args) throws Exception{
int index = 0;
while (true){
Thread.sleep(1000);
logger.info("ruozeshuju" + index++);
}
}
}

4、将IDEA打印的日志发送到Linux的Agent里

启动avro-source.conf,将传到本机44444端口的日志打印到控制台

5、Log4j Appender

Appends Log4j events to a flume agent’s avro source. 

将其内容复制到IDEA的log4j.properties上

6、添加依赖

A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.6.0.jar). 

7、修改log4j.propertities文件


log4j.rootCategory=INFO, console, flume
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = hadoop002
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout

注意:该文件要放在独立的resources文件夹下,并Directory as resources root才生效

8、启动avro-rouce.conf,并执行代码

服务器上:

五、Flume Interceptors

client --> source -------------------------------> channel ----------------------------->sink

                          Flume Channel Selectors                 Flume Sink Processors

interceptors常用的两种拦截器

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值