FLUME02

最新推荐文章于 2022-11-01 22:26:23 发布

HBinz

最新推荐文章于 2022-11-01 22:26:23 发布

阅读量391

点赞数

文章标签： BigData

本文链接：https://blog.csdn.net/Binbinhb/article/details/88375026

版权

一、重温

1、source

2、channel

3、sink

二、Flume

采集指定文件的内容，带分区的传到HDFS

1、按照设置的roll时间来生成.tmp文件

2、round

是否使用时间戳来处理。由于我们想使用时间来创建文件，因此这里要选择true

3、roundValue and roundUnit

1秒钟，按需求

conf文件exec-memory-hdfs-partition.conf:

a1.sources = r1

a1.sinks = k1

a1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /opt/data/data.log

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://hadoop002:9000/data/flume/page_views/%Y%m%d%H%M

a1.sinks.k1.hdfs.batchSize = 10

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.writeFormat = Text

a1.sinks.k1.hdfs.rollInterval = 0

a1.sinks.k1.hdfs.rollSize = 10485760

a1.sinks.k1.hdfs.rollCount = 0

a1.sinks.k1.hdfs.filePrefix = page-views

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 1

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.channels.c1.type = memory

a1.channels.c1.capacity = 100000

a1.channels.c1.transactionCapacity = 10000

a1.sinks.k1.channel = c1

a1.sources.r1.channels = c1

命令：

./flume-ng agent \

--name a1 \

--conf $FLUME_HOME/conf \

--conf-file /home/hadoop/script/flume/exec-memory-hdfs-partition.conf \

-Dflume.root.logger=INFO,console \

-Dflume.monitoring.type=http \

-Dflume.monitoring.port=34343

二、多Agent服务

1、一个Agent的输出作为另一个Agent的输入

2、多个Agent汇聚到一个Agent的Source，再从这个Agent将数据写到HDFS

从那个地址里面取数据

数据写入哪个地址里

3、为什么需要多层Agent汇聚（思考）

4、一个Source对应多个sinks、channels

client --> source -------------------------------> channel ----------------------------->sink

Flume Channel Selectors Flume Sink Processors

1）FlumeChannelSelectors的例子

Replicating Channel Selector：

a1.sources = r1
a1.channels = c1 c2 c3
a1.source.r1.selector.type = replicating
a1.source.r1.channels = c1 c2 c3
a1.source.r1.selector.optional = c3

Multiplexing Channel Selector：

a1.sources = r1
a1.channels = c1 c2 c3 c4
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state  Event的header的名字为state，如果state的数据是哪个c，就设置为那个c
a1.sources.r1.selector.mapping.CZ = c1
a1.sources.r1.selector.mapping.US = c2 c3
a1.sources.r1.selector.default = c4

2）Flume Sink Processors例子

failover ：

load_balance：负载均衡

Failover Sink Processor

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10    （数字越大，优先级越高）
a1.sinkgroups.g1.processor.maxpenalty = 10000

四、功能需求

需求：2个机器，把数据通过agent1到agent2输出到控制台

agent1:exec-memory-avro sink arvo-sink.conf

agent2:avro source - memory - logger avro-source.conf

avro-sink.conf：

avro-sink-agent.sources = exec-source

avro-sink-agent.sinks = avro-sink

avro-sink-agent.channels = avro-memory-channel

avro-sink-agent.sources.exec-source.type = exec

avro-sink-agent.sources.exec-source.command = tail -F /opt/data/avro_access.log

avro-sink-agent.sources.exec-source.channels = avro-memory-channel

avro-sink-agent.channels.avro-memory-channel.type = memory

avro-sink-agent.sinks.avro-sink.type = avro

avro-sink-agent.sinks.avro-sink.channel = avro-memory-channel

avro-sink-agent.sinks.avro-sink.hostname = 0.0.0.0

avro-sink-agent.sinks.avro-sink.port = 44444

avro-source.conf:

avro-source-agent.sources = avro-source

avro-source-agent.sinks = logger-sink

avro-source-agent.channels = avro-memory-channel

avro-source-agent.sources.avro-source.type = avro

avro-source-agent.sources.avro-source.channels = avro-memory-channel

avro-source-agent.sources.avro-source.bind = 0.0.0.0

avro-source-agent.sources.avro-source.port = 44444

avro-source-agent.channels.avro-memory-channel.type = memory

avro-source-agen.sinks.logger-sink.type = logger

avro-source-agen.sinks.logger-sink.channel = avro-memory-channel

命令1：

flume-ng agent \

--name avro-sink-agent \

--conf $FLUME_HOME/conf \

--conf-file /opt/script/flume/avro-sink.conf \

-Dflume.root.logger=INFO,console \

-Dflume.monitoring.type=http \

-Dflume.monitoring.port=34344

命令2：

flume-ng agent \

--name avro-source-agent \

--conf $FLUME_HOME/conf \

--conf-file /opt/script/flume/avro-source.conf \

-Dflume.root.logger=INFO,console \

-Dflume.monitoring.type=http \

-Dflume.monitoring.port=34343

总结：

1）先2后1

2）两个agent的端口不一样

需求：

IDEA搭建JavaWeb服务

1、IDEA在SCALA项目中建一个Directory，并Mark Directory as RourceRoot

2、建立java类

3、打印index日志

package com.ruozedata.flume;
import org.apache.log4j.Logger; //这里使用apache.log4j.Logger的方法，因为后面要通过修改log4j.propertites配置来达到将数据传输到Linux上的功能
public class LoggerGenerator {
private static Logger logger = Logger.getLogger(LoggerGenerator.class.getName());
public static void main(String[] args) throws Exception{
int index = 0;
while (true){
Thread.sleep(1000);
logger.info("ruozeshuju" + index++);
}
}
}

4、将IDEA打印的日志发送到Linux的Agent里

启动avro-source.conf，将传到本机44444端口的日志打印到控制台

5、Log4j Appender

Appends Log4j events to a flume agent’s avro source.

将其内容复制到IDEA的log4j.properties上

6、添加依赖

A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.6.0.jar).

7、修改log4j.propertities文件

log4j.rootCategory=INFO, console, flume
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = hadoop002
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout

注意：该文件要放在独立的resources文件夹下，并Directory as resources root才生效

8、启动avro-rouce.conf，并执行代码

服务器上：

五、Flume Interceptors

client --> source -------------------------------> channel ----------------------------->sink

Flume Channel Selectors Flume Sink Processors

interceptors常用的两种拦截器

HBinz

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
FLUME02

一、重温1、source2、channel3、sink二、Flume采集指定文件的内容，带分区的传到HDFS1、按照设置的roll时间来生成.tmp文件2、round是否使用时间戳来处理。由于我们想使用时间来创建文件，因此这里要选择true3、roundValue and roundUnit1秒钟，按需求conf文件exec-memory-h...
复制链接

扫一扫