日志收集框架–flume
webServer(源端) –> flume –> hdfs(目的地)
flume框架核心组件
source: 日志来源
channel: 渠道,数据处理管道
sink:存储目的地(要下落的地方)
jdk下载安装
下载:jdk-8-linux-x64.tar.gz
上传:rz
解压:tar -zvxf jdk-8-linux-x64.tar.gz -C ~/soft_install/
配置配置文件:vi ~/.bash_profile
export JAVA_HOME = /root/soft_install/jdk1.8.0
export PATH = $JAVA_HOME/bin:$PATH
source ~/.bash_profile
检测:java -version
flume下载安装
一:
下载:http://archive.cloudera.com/cdh5/cdh/5/
上传:rz
解压:tar -zvxf flume-ng-1.6.0-cdh5.7.0.tar.gz -C ~/soft_install/
配置配置文件:vi ~/.bash_profile
>###
export FLUME_HOME = /root/soft_install/apache-flume-1.6.0-cdh5.7.0-bin
export PATH = $FLUME_HOME/bin:$PATH
source ~/.bash_profile
二:
配置conf下配置文件:
cp flume-env.sh.template flume-env.sh
vi flume-env.sh 添加 JAVA_HOME = /root/soft_install/jdk1.8.0
检测:
flume-ng version
启动flume配置文件
flume-ng agent \
--name avro-memory-logger \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exampleB.conf \
-Dflume.root.logger=INFO,console
Event
Event: { headers:{} body: 69 20 6C 6F 76 65 20 6C 69 66 08 6E 66 65 69 66 i love lif.nfeif }
Event是flume中数据传输的基本单元
Event = 可选的header + bye array
flume核心就在于配置文件,新增一个配置文件,指定agent、source、channel、sink
关键是选择何种source、channel、sink
实战一:从指定的网络端口采集(获取)日志信息,并打印在控制台上
技术选型:netcat source + memory channel + logger sink
一: vi example.conf – 详见配置文件
二: 启动
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exampleB.conf \
-Dflume.root.logger=INFO,console
三:测试
另开一个窗口:telnet 192.168.145.128 44444 – 查询原窗口是否有日志信息打印
实战二:实时监控一个文件新增的内容
技术选型:exec source + memory channel + logger sink
一: vi example2.conf – 详见配置文件
二: 启动 – 最后一句是打印info级别的日志到控制台上
flume-ng agent \
–name a1 \
–conf FLUMEHOME/conf –conf−file FLUME_HOME/conf/example2.conf \
-Dflume.root.logger=INFO,console三:测试
另开一个窗口:telnet 192.168.145.128 44444 – 查询原窗口是否有日志信息打印
实战二进阶–离线处理
将收到的日志信息保存到hdfs中
技术选型:exec source + memory channel + hdfs sink
example3.conf
日志采集过程
机器A监控一个文件,将结果 (avro) sink 到另一个节点
机器B采用(avro) source接受 机器A sink的数据
机器B可采用logger将数据打印在控制台,或者保存,或者(kafka)
example1.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.binf = hadoop01
a1.sources.r1.port = 44444
# Describe/ the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
example2.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/data/example2.txt
a1.sources.r1.shell = /bin/sh -c
# Describe/ the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
example3.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/data/example2.txt
a1.sources.r1.shell = /bin/sh -c
# Describe/ the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://192.168.145.128:8020
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
exampleA.conf
# example exec-memory-avro
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /root/data/exampleA.txt
exec-memory-avro.sources.exec-source.shell = /bin/sh -c
# Describe/ the sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = 192.168.145.128
exec-memory-avro.sinks.avro-sink.port = 44444
# Use a channel which buffers events in memory
exec-memory-avro.channels.memory-channel.type = memory
# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel
exampleB.conf
# example avro-memory-logger
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel
# Describe/configure the source
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = 192.168.145.128
avro-memory-logger.sources.avro-source.port = 44444
# Describe/ the sink
avro-memory-logger.sinks.logger-sink.type = logger
# Use a channel which buffers events in memory
avro-memory-logger.channels.memory-channel.type = memory
# Bind the source and sink to the channel
avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel