Flume的案列使用

最新推荐文章于 2022-02-24 10:47:06 发布

哇咔咔大数据

最新推荐文章于 2022-02-24 10:47:06 发布

阅读量175

点赞数

分类专栏：大数据文章标签： flume 大数据 hadoop linux

本文链接：https://blog.csdn.net/weixin_55208421/article/details/118958704

版权

大数据专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Flume的使用

案例1监控端口数据

http://flume.apache.org/FlumeUserGuide.html#a-simple-example

mkdir -p /opt/bdp/apache-flume-1.6.0-bin/options

创建配置文件

vim example.conf

##新增以下内容

# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume

flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console

安装telnet
```
yum install telnet
```
向44444端口中输入数据
```
telnet localhost 44444
```
退出：在启动服务的窗口关闭
```
ctrl + c
```

提示：Memory Chanel 配置

capacity：默认该通道中最大的可以存储的event数量是100，

trasactionCapacity：每次最大可以source中拿到或者送到sink中的event数量也是100

keep-alive：event添加到通道中或者移出的允许时间

byte：即event的字节量的限制，只包括eventbody

案例2两个flume做集群

node01服务器中，配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/bdp/flume.txt

a1.sinks.k1.type = avro
a1.sinks.k1.hostname = node02
a1.sinks.k1.port = 45454

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

node02服务器中，安装Flume（步骤略）

a2.sources = r1
a2.sinks = k1
a2.channels = c1

a2.sources.r1.type = avro
a2.sources.r1.bind = node02
a2.sources.r1.port = 45454

a2.sinks.k1.type = logger

a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

先启动node02的Flume

flume-ng agent -n a2 -c options/ -f example.conf -Dflume.root.logger=INFO,console

再启动node01的Flume

flume-ng agent -n a1 -c options/ -f example.conf2

打开telnet 测试 node02控制台输出结果

案例3Exec Source

http://flume.apache.org/FlumeUserGuide.html#exec-source

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

Describe/configure the source

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/bdp/flume.exec.log

Describe the sink

a1.sinks.k1.type = logger

Use a channel which buffers events in memory

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

Bind the source and sink to the channel

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动Flume

flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console

创建空文件演示 touch flume.exec.log，循环添加数据

for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done

ping  www.baidu.com >> baidu.log

案例4Spooling Source

http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sinks.k1.type = logger

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/logs
a1.sources.r1.fileHeader = true

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动Flume

flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console

拷贝文件演示
```
mkdir logs

cp flume.exec.log logs/
```

案例5hdfs sink

http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/logs
a1.sources.r1.fileHeader = true

a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://hdfs-bdp/flume/%Y-%m-%d/%H%M

##每隔60s或者文件大小超过10M的时候产生新文件

##hdfs有多少条消息时新建文件，0不基于消息个数

a1.sinks.k1.hdfs.rollCount=0

##hdfs创建多长时间新建文件，0不基于时间

a1.sinks.k1.hdfs.rollInterval=60

##hdfs多大时新建文件，0不基于文件大小

a1.sinks.k1.hdfs.rollSize=10240

##当目前被打开的临时文件在该参数指定的时间（秒）内，没有任何数据写入，则将该临时文件关闭并重命名成目标文件

a1.sinks.k1.hdfs.idleTimeout=3

a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp=true

##每五分钟生成一个目录:

##是否启用时间上的”舍弃”，这里的”舍弃”，类似于”四舍五入”，后面再介绍。如果启用，则会影响除了%t的其他所有时间表达式

a1.sinks.k1.hdfs.round=true

##时间上进行“舍弃”的值；

a1.sinks.k1.hdfs.roundValue=5

##时间上进行”舍弃”的单位，包含：second,minute,hour

a1.sinks.k1.hdfs.roundUnit=minute

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

创建HDFS目录
```
hadoop fs -mkdir /flume
```

启动Flume

flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console

查看hdfs文件
```
hadoop fs -ls /flume/*
```

五入”，后面再介绍。如果启用，则会影响除了%t的其他所有时间表达式

a1.sinks.k1.hdfs.round=true

##时间上进行“舍弃”的值；

a1.sinks.k1.hdfs.roundValue=5

##时间上进行”舍弃”的单位，包含：second,minute,hour

a1.sinks.k1.hdfs.roundUnit=minute

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1




- 创建HDFS目录

```apl
hadoop fs -mkdir /flume

启动Flume

flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console

查看hdfs文件
```
hadoop fs -ls /flume/*
```

哇咔咔大数据

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录