Flume日志收集一看就会 []~(￣▽￣)~*

最新推荐文章于 2024-10-19 21:02:15 发布

a姜哲雨

最新推荐文章于 2024-10-19 21:02:15 发布

阅读量435

点赞数 1

分类专栏： # Flume 文章标签：大数据 flume

本文链接：https://blog.csdn.net/weixin_42487460/article/details/107899790

版权

Flume 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Flume日志收集

Apache Flume简介
Flume架构
Flume 安装部署
Flume Agent基本配置
案例实现
taildir源
hdfs sink 示例：检测并上传hdfs
flume interceptors 拦截器匹配过滤
- 使用正则，过滤去除不需要的字段
- 自定义拦截器的使用

Apache Flume简介

Flume用于将多种来源的日志以流的方式传输至Hadoop或者其它目的地
- 一种可靠、可用的高效分布式数据收集服务
Flume拥有基于数据流上的简单灵活架构，支持容错、故障转移与恢复
由Cloudera 2009年捐赠给Apache，现为Apache顶级项目

Flume架构

Client：客户端，数据产生的地方，如Web服务器
Event：事件，指通过Agent传输的单个数据包，如日志数据通常对应一行数据
Agent：代理，一个独立的JVM进程
- Flume以一个或多个Agent部署运行
- Agent包含三个组件
  - Source
  - Channel
  - Sink
运行机制

Flume的核心是把数据从数据源（source）收集过来，再将收集到的数据送到指定的目的地（sink）。为了保证输送过程一定成功，在送到目的地（sink）之前，会先缓存数据（channel），待数据真正到达目的地（Sink）后，flume删除自己缓存的数据。

Flume 安装部署

安装步骤

# 解压
[root@jzy1 opt]# tar -zxf flume-ng-1.6.0-cdh5.14.2.tar.gz
# 移动并重命名
[root@jzy1 opt]# mv apache-flume-1.6.0-cdh5.14.2-bin/ soft/flume160
# 拷贝配置目录 并修改相关配置
[root@jzy1 conf]# cp flume-env.sh.template  flume-env.sh
[root@jzy1 conf]# vi flume-env.sh
    # 添加jdk路径
    export JAVA_HOME=/opt/soft/jdk180

 	# 配置环境变量
    # vi /etc/profile 末尾添加
    export FLUME_HOME=/opt/soft/flume160
	export PATH=$PATH:$FLUME_HOME/bin

# 验证是否安装成功
[root@jzy1 ~]# flume-ng version

Flume Agent基本配置

agent.sources = s1    
agent.channels = c1  
agent.sinks = sk1    
#设置Source为netcat 端口为5678，使用的channel为c1  
agent.sources.s1.type = netcat  
agent.sources.s1.bind = localhost  
agent.sources.s1.port = 5678  
agent.sources.s1.channels = c1    
#设置Sink为logger模式，使用的channel为c1  
agent.sinks.sk1.type = logger  
agent.sinks.sk1.channel = c1  
#设置channel为capacity 
agent.channels.c1.type = memory

#执行启动命令 在控制台输出
flume-ng agent --name agent -f h0.conf -Dflume.root.logger=INFO,console

案例实现

netcat源

[root@jzy1 flumeconf]# vi conf_0804.properties

新建一个配置文件，配置文件内容如下

# 组件别名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.56.21
a1.sources.r1.port = 6666

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory     # 设置类型为内存传输
a1.channels.c1.capacity = 1000	 # 设置该管道通道中最大可以存储的event数量
a1.channels.c1.transactionCapacity = 100  # 每次最大可以从source中拿到或者送到sink中的event数量

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

下载插件 nc

yum install nmap-ncat.x86_64 -y

运行该文件，在控制台输出

flume-ng agent -n a1 -c conf -f /opt/flumeconf/conf_0804.properties -Dflume.root.logger=INFO,console

启动nc

c 192.168.56.21 6666

数据被完整显示，如下图

在这里插入图片描述

利用Spooling Directory源监控目录操作示例

示例：将存放存在datas中的cusotmer.csv文件读取以logger形式输出

a2.channels=c2
a2.sources=s2
a2.sinks=k2

a2.sources.s2.type=spooldir
a2.sources.s2.spoolDir=/opt/datas
    

a2.channels.c2.type=memory
a2.channels.c2.capacity=10000
a2.channels.c2.transactionCapacity=1000

a2.sinks.k2.type=logger

a2.sinks.k2.channel=c2
a2.sources.s2.channels=c2

# 运行    
flume-ng agent -n a2 -c conf -f /opt/flumeconf/conf_0805_readfile.properties -Dflume.root.logger=INFO,console

读取后的文件后缀名会改变，将文件格式相同的文件拖入datas中，永久自动监测，events.csv拖入后文件名也改变，如果想要拖入文件不被读取，停止运行并把该文件后缀名删除即可

exec源

这里不再详细演示，配置文件如下

a1.sources = s1
a1.channels = c1
a1.sinks = sk1

#设置Source为exec
a1.sources.s1.type = exec
a1.sources.s1.command = tail -f /opt/datas/exectest.txt

#source和channel连接
a1.sources.s1.channels = c1
a1.channels.c1.type = memory

#指定Sink
a1.sinks.sk1.type = logger
#sink和channel进行连接
a1.sinks.sk1.channel = c1

a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

http源

a1.sources = s1
a1.channels = c1
a1.sinks = sk1

a1.sources.s1.type = http
a1.sources.s1.port = 5140

#source和channel连接
a1.sources.s1.channels = c1
a1.channels.c1.type = memory

#指定Sink
a1.sinks.sk1.type = logger
#sink和channel进行连接
a1.sinks.sk1.channel = c1

a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

实现：

curl -XPOST localhost:5140 -d '[{"headers":{"h1":"v1","h2":"v2"},"body":"hello flume"}]'

taildir源

解析的文件会指定输出到某文件中
配置文件如下

a1.sources = s1
a1.channels = c1
a1.sinks = sk1

a1.sources.s1.type = TAILDIR
a1.sources.s1.filegroups = f1 f2
# 配置filegroups的f1
a1.sources.s1.filegroups.f1 = /opt/datas/tail_1/example.log
a1.sources.s1.filegroups.f2 = /opt/datas/tail_2/.*log.*

#指定position的位置
a1.sources.s1.positionFile = /opt/datas/tail_position/taildir_position.json

#指定headers
a1.sources.s1.headers.f1.headerKey1 = value1
a1.sources.s1.headers.f2.headerKey1 = value2
a1.sources.s1.headers.f2.headerKey2 = value3

a1.sources.s1.fileHeader = true


#source和channel连接
a1.sources.s1.channels = c1
a1.channels.c1.type = memory

#指定Sink
a1.sinks.sk1.type = logger
#sink和channel进行连接
a1.sinks.sk1.channel = c1

a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

hdfs sink 示例：检测并上传hdfs

a2.channels=c2
a2.sources=s2
a2.sinks=k2

a2.sources.s2.type=spooldir
a2.sources.s2.spoolDir=/opt/datas
a2.source.s2.fileHeader=false

a2.channels.c2.type=memory
a2.channels.c2.capacity=10000
a2.channels.c2.transactionCapacity=1000

a2.sinks.k2.type=hdfs
a2.sinks.k2.hdfs.path=hdfs://jzy1:9000/data/customers
a2.sinks.k2.hdfs.rollCount=5000
a2.sinks.k2.hdfs.rollSize=600000
a2.sinks.k2.hdfs.batchSize=500

a2.sinks.k2.channel=c2
a2.sources.s2.channels=c2
 
# 运行该配置文件 
flume-ng agent -n a2 -c conf -f /opt/flumeconf/conf_0805_readfile.properties
    
    
    
//在hive中建外部表映射
    
    hive> create external table xxx(id string,fname string,lname string,email string,gender string,address string,lan string,job string, ct string,cr string)
    > row format delimited fields terminated by ','
    > stored as sequencefile
    > location '/data/customers'
    > tblproperties("skip.header.line.count"="1");
OK
Time taken: 0.064 seconds
    
    
# 查询可发现数据成功上传
hive> select * from xxx limit 3;
OK
1	Spencer	Raffeorty	sraffeorty0@dropbox.com	Male	9274 Lyons Court	China	KhmerSafety Technician III	jcb
2	Cherye	Poynor	cpoynor1@51.la	Female	1377 Anzinger Avenue	China	Czech	Research Nursinstapayment
3	Natasha	Abendroth	nabendroth2@scribd.com	Female	2913 Evergreen Lane	China	Yiddish	Budget/Accounting Analyst IV	visa
Time taken: 0.049 seconds, Fetched: 3 row(s)

flume interceptors 拦截器匹配过滤

使用正则，过滤去除不需要的字段

配置文件如下

a3.channels=c3
a3.sources=s3
a3.sinks=k3

a3.sources.s3.type=spooldir
a3.sources.s3.spoolDir=/opt/datas
a3.sources.s3.interceptors=userid_filter
a3.sources.s3.interceptors.userid_filter.type=regex_filter
a3.sources.s3.interceptors.userid_filter.regex=userid.*
a3.sources.s3.interceptors.userid_filter.excludeEvents=true

a3.channels.c3.type=memory

a3.sinks.k3.type=logger

a3.sources.s3.channels=c3
a3.sinks.k3.channel=c3
    
#### 启动命令多写 会忘。。。。    
 flume-ng agent -n a3 -c conf -f /opt/flumeconf/conf_0805_interceptor.properties -Dflume.root.logger=INFO,console

自定义拦截器的使用

文件内容如下：

张三,男,20
李四,女,18
王五,男,28

现在需求将男和女的字段修改成数字格式，如：男->1 女->2 未知->0

使用自定义拦截器

public class CustomInterceptor implements Interceptor {
    @Override
    public void initialize() {

    }

    @Override
    public Event intercept(Event event) {
        byte[] body = event.getBody();
        String line = new String(body);//1,张三，男，40
        String[] sps = line.split(",");
        switch (sps[2]){
            case "男":
                sps[2]="1";
                break;
            case "女":
                sps[2]="2";
                break;
            default:
                sps[2]="0";
        }
        String newStr=sps[0]+","+sps[1]+","+sps[2]+","+sps[3];
        event.setBody(newStr.getBytes());
        return event;
    }

    @Override
    public List<Event> intercept(List<Event> list) {
        for (Event event : list) {
            intercept(event);
        }
        return list;
    }

    @Override
    public void close() {

    }

    public static class Builder implements Interceptor.Builder{

        @Override
        public Interceptor build() {
            return new CustomInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }
}

将写好的自定义拦截器打包上传至flume的lib目录下

在这里插入图片描述

编写配置文件

a4.channels=c4
a4.sources=s4
a4.sinks=k4

a4.sources.s4.type=spooldir
a4.sources.s4.spoolDir=/opt/datas
a4.sources.s4.interceptors=myintec
a4.sources.s3.interceptors.myintec.type=com.jstd.myinterceptors.CustomInterceptor$Builder

a4.channels.c4.type=memory

a4.sinks.k4.type=logger

a4.sources.s4.channels=c4
a4.sinks.k3.channel=c4

    
    
# 运行该配置文件    
flume-ng agent -n a4 -c conf -f /opt/flumeconf/conf_0806_custconf.properties -Dflume.root.logger=INFO,console