大数据_Flume

简介

1、Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。
2、Flume可以采集文件,socket数据包、文件、文件夹、kafka等各种形式源数据,又可以将采集到 的数据(下沉sink)输出到HDFS、hbase、hive、kafka等众多外部存储系统中
3、一般的采集需求,通过对flume的简单配置即可实现
4、Flume针对特殊场景也具备良好的自定义扩展能力, 因此,flume可以适用于大部分的日常数据采集场景

架构

在这里插入图片描述

案例

监听网络

在flume的conf目录下新建一个配置文件(采集方案)。
本地开发信息通信服务,其它客户端通过netcat 192.168.237.131 8888 向flume实时发送数据,输出到日志。

#定义agent名, source、channel、sink的名称
a1.sources = r1
a1.channels = c1
a1.sinks = k1

#具体定义source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

#具体定义channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#具体定义sink
a1.sinks.k1.type = logger

#组装source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

采集目录到 HDFS

1、需求
某服务器的某特定目录下,会不断产生新的文件,每当有新文件出现,就需要把文件采集到HDFS中去

2、分析
根据需求,首先定义以下3大要素

  1. 数据源组件,即source ——监控文件目录 : spooldir
    1. 监视一个目录,只要目录中出现新文件,就会采集文件中的内容
    2. 采集完成的文件,会被agent自动添加一个后缀:COMPLETED
    3. 所监视的目录中不允许重复出现相同文件名的文件
  2. 下沉组件,即sink——HDFS文件系统 : hdfs sink
  3. 通道组件,即channel——可用file channel 也可以用内存channel

3、Flume 配置文件
cd /export/servers/apache-flume-1.8.0-bin/conf
mkdir -p /export/servers/dirfile
vim spooldir.conf

# Name the components on this agent 
a1.sources = r1 a1.sinks = k1 
a1.channels = c1 
# Describe/configure the source 
##注意:不能往监控目中重复丢同名文件 
a1.sources.r1.type = spooldir 
a1.sources.r1.spoolDir = /export/servers/dirfile 
a1.sources.r1.fileHeader = true 
# Describe the sink 
a1.sinks.k1.type = hdfs 
a1.sinks.k1.channel = c1 
a1.sinks.k1.hdfs.path = hdfs://node01:8020/spooldir/files/%y-%m-%d/%H%M/ 
a1.sinks.k1.hdfs.filePrefix = events- 
a1.sinks.k1.hdfs.round = true 
a1.sinks.k1.hdfs.roundValue = 10 
a1.sinks.k1.hdfs.roundUnit = minute 
a1.sinks.k1.hdfs.rollInterval = 3 
a1.sinks.k1.hdfs.rollSize = 20 
a1.sinks.k1.hdfs.rollCount = 5 
a1.sinks.k1.hdfs.batchSize = 1 
a1.sinks.k1.hdfs.useLocalTimeStamp = true 
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本 
a1.sinks.k1.hdfs.fileType = DataStream 
# Use a channel which buffers events in memory 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1 

Channel参数解释
capacity:默认该通道中最大的可以存储的event数量
trasactionCapacity:每次最大可以从source中拿到或者送到sink中的event数量
keep-alive:event添加到通道中或者移出的允许时间

4、启动 Flume
bin/flume-ng agent -c ./conf -f ./conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console

5、上传文件到指定目录
cd /export/servers/dirfile

说明:将不同的文件上传到下面目录里面去,注意文件不能重名。

采集文件到HDFS

1、需求
比如业务系统使用log4j生成的日志,日志内容不断增加,需要把追加到日志文件中的数据实时采集到 hdfs。

2、分析
根据需求,首先定义以下3大要素

采集源,即source——监控文件内容更新 : exec ‘tail -F file’ 。
下沉目标,即sink——HDFS文件系统 : hdfs sink 。
Source和sink之间的传递通道——channel,可用file channel 也可以用 内存channel 。

3、创建配置文件
cd /export/servers/apache-flume-1.8.0-bin/conf
vim tail-file.conf

agent1.sources = source1 
agent1.sinks = sink1 
agent1.channels = channel1 
 
# Describe/configure tail -F source1 
agent1.sources.source1.type = exec 
agent1.sources.source1.command = tail -F /export/servers/taillogs/access_log 
agent1.sources.source1.channels = channel1 
 
 
# Describe sink1 
agent1.sinks.sink1.type = hdfs 
#a1.sinks.k1.channel = c1 
agent1.sinks.sink1.hdfs.path = hdfs://node01:8020/weblog/flume-collection/%y-%m-%d/%H-% 
agent1.sinks.sink1.hdfs.filePrefix = access_log 
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000 
agent1.sinks.sink1.hdfs.batchSize= 100 
agent1.sinks.sink1.hdfs.fileType = DataStream 
agent1.sinks.sink1.hdfs.writeFormat =Text 
 
agent1.sinks.sink1.hdfs.round = true 
agent1.sinks.sink1.hdfs.roundValue = 10 
agent1.sinks.sink1.hdfs.roundUnit = minute 
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true 
 
# Use a channel which buffers events in memory 
agent1.channels.channel1.type = memory 
agent1.channels.channel1.keep-alive = 120 
agent1.channels.channel1.capacity = 500000 
agent1.channels.channel1.transactionCapacity = 600 
 
# Bind the source and sink to the channel 
agent1.sources.source1.channels = channel1 
agent1.sinks.sink1.channel = channel1 

4、启动flume
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/tail-file.conf -n agent1 -Dflume.root.logger=INFO,console

5、开发 Shell 脚本定时追加文件内容
mkdir -p /export/servers/shells/
cd /export/servers/shells/
vim tail-file.sh

#!/bin/bash 
while true 
do  
	date >> /export/servers/taillogs/access_log;   
	sleep 0.5; 
done

6、启动脚本

# 创建文件夹 
mkdir -p /export/servers/taillogs 
# 启动脚本 sh 
/export/servers/shells/tail-file.sh 

监听文件

#bin/flume-ng agent -n a2 -f /home/hadoop/a2.conf -c conf -Dflume.root.logger=INFO,console
#定义agent名, source、channel、sink的名称
a2.sources = r1
a2.channels = c1
a2.sinks = k1

#具体定义source  监听某个log文件
a2.sources.r1.type = exec
a2.sources.r1.command = tail -F /home/hadoop/a.log

#具体定义channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

#具体定义sink  输出到日志
a2.sinks.k1.type = logger

#组装source、channel、sink
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

Agent 级联

1、需求:
在这里插入图片描述
2、分析
第一个agent负责收集文件当中的数据,通过网络发送到第二个agent当中去
第二个agent负责接收第一个agent发送的数据,并将数据保存到hdfs上面去

3、Node02 安装 Flume
将node03机器上面解压后的flume文件夹拷贝到node02机器上面去

cd  /export/servers 
scp -r apache-flume-1.8.0-bin/ node02:$PWD 

4、Node02 配置 Flume
在node02机器配置我们的flume
cd /export/servers/ apache-flume-1.8.0-bin/conf
vim tail-avro-avro-logger.conf

# Name the components on this agent 
a1.sources = r1 
a1.sinks = k1 
a1.channels = c1 
# Describe/configure the source 
a1.sources.r1.type = exec 
a1.sources.r1.command = tail -F /export/servers/taillogs/access_log 
a1.sources.r1.channels = c1 
# Describe the sink 
##sink端的avro是一个数据发送者 
a1.sinks = k1 
a1.sinks.k1.type = avro 
a1.sinks.k1.channel = c1 
a1.sinks.k1.hostname = 192.168.174.120 
a1.sinks.k1.port = 4141 
a1.sinks.k1.batch-size = 10 
# Use a channel which buffers events in memory 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1 

5、开发脚本向文件中写入数据
cd /export/servers
scp -r shells/ taillogs/ node02:$PWD

说明:
直接将node03下面的脚本和数据拷贝到node02即可,node03机器上执行以下命令

6、Node03 Flume 配置文件
在node03机器上开发flume的配置文件
cd /export/servers/apache-flume-1.8.0-bin/conf
vim avro-hdfs.conf

# Name the components on this agent 
a1.sources = r1 a1.sinks = k1 
a1.channels = c1 
# Describe/configure the source 
##source中的avro组件是一个接收者服务 
a1.sources.r1.type = avro 
a1.sources.r1.channels = c1 
a1.sources.r1.bind = 192.168.174.120 
a1.sources.r1.port = 4141 
# Describe the sink 
a1.sinks.k1.type = hdfs 
a1.sinks.k1.hdfs.path = hdfs://node01:8020/av /%y-%m-%d/%H%M/ 
a1.sinks.k1.hdfs.filePrefix = events- 
a1.sinks.k1.hdfs.round = true 
a1.sinks.k1.hdfs.roundValue = 10 
a1.sinks.k1.hdfs.roundUnit = minute 
a1.sinks.k1.hdfs.rollInterval = 3 
a1.sinks.k1.hdfs.rollSize = 20 
a1.sinks.k1.hdfs.rollCount = 5 
a1.sinks.k1.hdfs.batchSize = 1 
a1.sinks.k1.hdfs.useLocalTimeStamp = true 
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本 
a1.sinks.k1.hdfs.fileType = DataStream 
# Use a channel which buffers events in memory 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
 
# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1 

7、顺序启动

node03机器启动flume进程
cd /export/servers/apache-flume-1.8.0-bin 
bin/flume-ng agent -c conf -f conf/avro-hdfs.conf -n a1  -Dflume.root.logger=INFO,console

node02机器启动flume进程
cd /export/servers/apache-flume-1.8.0-bin/ 
bin/flume-ng agent -c conf -f conf/tail-avro-avro-logger.conf -n a1  -Dflume.root.logger=INFO,console

node02机器启shell脚本生成文件
cd  /export/servers/shells 
sh tail-file.sh 

高可用

在完成单点的Flume NG搭建后,下面我们搭建一个高可用的Flume NG集群,架构图如下所示:
在这里插入图片描述

Node01 安装和配置

将node03机器上面的flume安装包以及文件生产的两个目录拷贝到node01机器上面去

node03机器执行以下命令

cd /export/servers 
scp -r apache-flume-1.8.0-bin/ node01:$PWD 
scp -r shells/ taillogs/ node01:$PWD 

node01机器配置agent的配置文件

cd /export/servers/apache-flume-1.8.0-bin/conf 
vim agent.conf 

#agent1 name 
agent1.channels = c1 
agent1.sources = r1 
agent1.sinks = k1 k2 
# ##set gruop 
agent1.sinkgroups = g1 # 
 
agent1.sources.r1.channels = c1 
agent1.sources.r1.type = exec 
agent1.sources.r1.command = tail -F /export/servers/taillogs/access_log 
# ##set channel 
agent1.channels.c1.type = memory 
agent1.channels.c1.capacity = 1000 
agent1.channels.c1.transactionCapacity = 100 
# ## set sink1 
agent1.sinks.k1.channel = c1 
agent1.sinks.k1.type = avro 
agent1.sinks.k1.hostname = node02 
agent1.sinks.k1.port = 52020 
# ## set sink2 
agent1.sinks.k2.channel = c1 
agent1.sinks.k2.type = avro 
agent1.sinks.k2.hostname = node03 
agent1.sinks.k2.port = 52020 
# ##set sink group 
agent1.sinkgroups.g1.sinks = k1 k2 
# ##set failover 
agent1.sinkgroups.g1.processor.type = failover 
agent1.sinkgroups.g1.processor.priority.k1 = 10 
agent1.sinkgroups.g1.processor.priority.k2 = 1 
agent1.sinkgroups.g1.processor.maxpenalty = 10000 

Node02 与 Node03 配置 FlumeCollection

node02机器修改配置文件

cd /export/servers/apache-flume-1.8.0-bin/conf 
vim collector.conf 

#set Agent name 
a1.sources = r1 a1.channels = c1 
a1.sinks = k1 
# ##set channel 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
# ## other node,nna to nns 
a1.sources.r1.type = avro 
a1.sources.r1.bind = node02 
a1.sources.r1.port = 52020 
a1.sources.r1.channels = c1 
# ##set sink to hdfs 
a1.sinks.k1.type=hdfs 
a1.sinks.k1.hdfs.path= hdfs://node01:8020/flume/failover/ 
a1.sinks.k1.hdfs.fileType=DataStream 
a1.sinks.k1.hdfs.writeFormat=TEXT 
a1.sinks.k1.hdfs.rollInterval=10 
a1.sinks.k1.channel=c1 
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d # 

node03机器修改配置文件

cd  /export/servers/apache-flume-1.8.0-bin/conf 
vim collector.conf 

#set Agent name 
a1.sources = r1 
a1.channels = c1 
a1.sinks = k1 
# ##set channel 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
# ## other node,nna to nns 
a1.sources.r1.type = avro 
a1.sources.r1.bind = node03 
a1.sources.r1.port = 52020 
a1.sources.r1.channels = c1 
# ##set sink to hdfs 
a1.sinks.k1.type=hdfs 
a1.sinks.k1.hdfs.path= hdfs://node01:8020/flume/failover/ 
a1.sinks.k1.hdfs.fileType=DataStream 
a1.sinks.k1.hdfs.writeFormat=TEXT 
a1.sinks.k1.hdfs.rollInterval=10 
a1.sinks.k1.channel=c1 
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d 

顺序启动

node03机器上面启动flume
cd /export/servers/apache-flume-1.8.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/collector.conf -Dflume.root.logger=DEBUG,console

node02机器上面启动flume
cd /export/servers/apache-flume-1.8.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/collector.conf -Dflume.root.logger=DEBUG,console

node01机器上面启动flume
cd /export/servers/apache-flume-1.8.0-bin
bin/flume-ng agent -n agent1 -c conf -f conf/agent.conf -Dflume.root.logger=DEBUG,console

node01机器启动文件产生脚本
cd /export/servers/shells
sh tail-file.sh

Failover 测试

下面我们来测试下Flume NG集群的高可用(故障转移)。场景如下:我们在Agent1节点上传文件,由 于我们配置Collector1的权重比Collector2大,所以 Collector1优先采集并上传到存储系统。然后我们 kill掉Collector1,此时有Collector2负责日志的采集上传工作,之后,我 们手动恢复Collector1节点的 Flume服务,再次在Agent1上次文件,发现Collector1恢复优先级别的采集工作。具体如下步骤所示:

Collector1优先上传;
HDFS集群中上传的log内容预览;
Collector1宕机,Collector2获取优先上传权限;
重启Collector1服务,Collector1重新获得优先上传的权限 ;

Flume 的负载均衡

负载均衡是用于解决一台机器(一个进程)无法解决所有请求而产生的一种算法。Load balancing Sink Processor 能够实现 load balance 功能,如下图Agent1 是一个路由节点,负责将 Channel 暂存的 Event 均衡到对应的多个 Sink组件上,而每个 Sink 组件分别连接到一个独立的 Agent 上,示例配置, 如下所示:
在这里插入图片描述
在此处我们通过三台机器来进行模拟flume的负载均衡
三台机器规划如下:

node01:采集数据,发送到node02和node03机器上去
node02:接收node01的部分数据
node03:接收node01的部分数据

开发node01服务器的flume配置

node01服务器配置:

cd /export/servers/apache-flume-1.8.0-bin/conf 
vim load_banlancer_client.conf 

# agent name 
a1.channels = c1 
a1.sources = r1 
a1.sinks = k1 k2 

# set gruop 
a1.sinkgroups = g1 
 
# set channel 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
a1.sources.r1.channels = c1 
a1.sources.r1.type = exec 
a1.sources.r1.command = tail -F /export/servers/taillogs/access_log 
 
# set sink1 
a1.sinks.k1.channel = c1 
a1.sinks.k1.type = avro 
a1.sinks.k1.hostname = node02 
a1.sinks.k1.port = 52020 
 
# set sink2 
a1.sinks.k2.channel = c1 
a1.sinks.k2.type = avro 
a1.sinks.k2.hostname = node03 
a1.sinks.k2.port = 52020 
 
# set sink group 
a1.sinkgroups.g1.sinks = k1 k2 
 
# set failover 
a1.sinkgroups.g1.processor.type = load_balance 
a1.sinkgroups.g1.processor.backoff = true 
a1.sinkgroups.g1.processor.selector = round_robin 
a1.sinkgroups.g1.processor.selector.maxTimeOut=10000 

开发node02服务器的flume配置

cd /export/servers/apache-flume-1.8.0-bin/conf  
vim load_banlancer_server.conf 

# Name the components on this agent 
a1.sources = r1 
a1.sinks = k1 
a1.channels = c1 
 
# Describe/configure the source 
a1.sources.r1.type = avro 
a1.sources.r1.channels = c1 
a1.sources.r1.bind = node02 
a1.sources.r1.port = 52020 
 
# Describe the sink 
a1.sinks.k1.type = logger 
  
# Use a channel which buffers events in memory 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
 
# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1 

开发node03服务器flume配置

node03服务器配置

cd /export/servers/apache-flume-1.8.0-bin/conf 

vim load_banlancer_server.conf 
# Name the components on this agent 
a1.sources = r1 
a1.sinks = k1 
a1.channels = c1 

# Describe/configure the source 
a1.sources.r1.type = avro 
a1.sources.r1.channels = c1 
a1.sources.r1.bind = node03 
a1.sources.r1.port = 52020 

# Describe the sink 
a1.sinks.k1.type = logger 

# Use a channel which buffers events in memory 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1 

准备启动flume服务

启动node03的flume服务
cd /export/servers/apache-flume-1.8.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/load_banlancer_server.conf -Dflume.root.logger=INFO,console

启动node02的flume服务
cd /export/servers/apache-flume-1.8.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/load_banlancer_server.conf -Dflume.root.logger=INFO,console

启动node01的flume服务
cd /export/servers/apache-flume-1.8.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/load_banlancer_client.conf -Dflume.root.logger=INFO,console

node01服务器运行脚本产生数据

cd /export/servers/shells
sh tail-file.sh

Flume 案例

案例场景

A、B两台日志服务机器实时生产日志主要类型为access.log、nginx.log、web.log
现在要求:

把A、B 机器中的access.log、nginx.log、web.log 采集汇总到C机器上然后统一收集到hdfs中。 但是在hdfs中要求的目录为:

/source/logs/access/20180101/**
/source/logs/nginx/20180101/**
/source/logs/web/20180101/**

场景分析

在这里插入图片描述

数据流程处理分析

在这里插入图片描述

实现

服务器A对应的IP为 192.168.174.100
服务器B对应的IP为 192.168.174.110
服务器C对应的IP为 192.168.174.120

采集端配置文件开发
node01与node02服务器开发flume的配置文件

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf 

vim exec_source_avro_sink.conf 
# Name the components on this agent
a1.sources = r1 r2 r3 
a1.sinks = k1 
a1.channels = c1 

# Describe/configure the source 
a1.sources.r1.type = exec a1.sources.r1.command = tail -F /export/servers/taillogs/access.log   
a1.sources.r1.interceptors = i1 
a1.sources.r1.interceptors.i1.type = static 

## static拦截器的功能就是往采集到的数据的header中插入自己定## 义的key-value对 
a1.sources.r1.interceptors.i1.key = type a1.sources.r1.interceptors.i1.value = access 
a1.sources.r2.type = exec a1.sources.r2.command = tail -F /export/servers/taillogs/nginx.log 
a1.sources.r2.interceptors = i2 
a1.sources.r2.interceptors.i2.type = static 
a1.sources.r2.interceptors.i2.key = type 
a1.sources.r2.interceptors.i2.value = nginx 
a1.sources.r3.type = exec a1.sources.r3.command = tail -F /export/servers/taillogs/web.log 
a1.sources.r3.interceptors = i3 
a1.sources.r3.interceptors.i3.type = static 
a1.sources.r3.interceptors.i3.key = type 
a1.sources.r3.interceptors.i3.value = web 
 
# Describe the sink 
a1.sinks.k1.type = avro 
a1.sinks.k1.hostname = node03 
a1.sinks.k1.port = 41414 
# Use a channel which buffers events in memory 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 20000 
a1.channels.c1.transactionCapacity = 10000 
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sources.r2.channels = c1 
a1.sources.r3.channels = c1 
a1.sinks.k1.channel = c1 

服务端配置文件开发
在node03上面开发flume配置文件

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf 
vim avro_source_hdfs_sink.conf 

a1.sources = r1 a1.sinks = k1 
a1.channels = c1 

# 定义source
a1.sources.r1.type = avro 
a1.sources.r1.bind = 192.168.174.120 
a1.sources.r1.port =41414 
 
# 添加时间拦截器
 
a1.sources.r1.interceptors = i1 
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$ 

# 定义channels 
a1.channels.c1.type = memory 
a1.channels.c1.capacity = 20000 
a1.channels.c1.transactionCapacity = 10000 

# 定义sink 
a1.sinks.k1.type = hdfs 
a1.sinks.k1.hdfs.path=hdfs://192.168.174.100:8020/source/logs/%{type}/%Y%m%d 
a1.sinks.k1.hdfs.filePrefix =events a1.sinks.k1.hdfs.fileType = DataStream 
a1.sinks.k1.hdfs.writeFormat = Text

# 时间类型 
a1.sinks.k1.hdfs.useLocalTimeStamp = true 

# 生成的文件不按条数生成
a1.sinks.k1.hdfs.rollCount = 0 

# 生成的文件按时间生成 
a1.sinks.k1.hdfs.rollInterval = 30 

# 生成的文件按大小生成 
 
a1.sinks.k1.hdfs.rollSize  = 10485760 
# 批量写入hdfs的个数
a1.sinks.k1.hdfs.batchSize = 10000 

# flume操作hdfs的线程数(包括新建,写入等) 
a1.sinks.k1.hdfs.threadsPoolSize=10 

# 操作hdfs超时时间 
a1.sinks.k1.hdfs.callTimeout=30000 

# 组装source、channel、sink
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1 

采集端文件生成脚本
在node01与node02上面开发shell脚本,模拟数据生成

cd /export/servers/shells vim server.sh 

# !/bin/bash
while true 
do  
 date >> /export/servers/taillogs/access.log; 
 date >> /export/servers/taillogs/web.log; 
 date >> /export/servers/taillogs/nginx.log; 
  sleep 0.5;  
done 

顺序启动服务
node03启动flume实现数据收集

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin 
bin/flume-ng agent -c conf -f conf/avro_source_hdfs_sink.conf -name a1 -Dflume.root.logger=INOF,console

node01与node02启动flume实现数据监控

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin 
bin/flume-ng agent -c conf -f conf/exec_source_avro_sink.conf -name a1 -Dflume.root.logger=INOF,console

node01与node02启动生成文件脚本

cd /export/servers/shells 
sh server.sh 

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值