flume安装及其配置

笙念&

已于 2022-12-07 19:38:59 修改

阅读量3.1k

点赞数 2

分类专栏：大数据平台搭建文章标签： flume linux java

于 2022-07-28 21:26:07 首次发布

本文链接：https://blog.csdn.net/lclchong/article/details/126044333

版权

大数据平台搭建专栏收录该内容

13 篇文章 1 订阅

订阅专栏

Cd flume2.1.2 安装部署

（1）将 apache-flume-1.9.0-bin.tar.gz 上传到 linux 的/export/software 目录下

（2）解压 apache-flume-1.9.0-bin.tar.gz 到/opt/module/目录下

[atguigu@hadoop102 software]$ tar -zxvf /export/software/apacheflume-1.9.0-bin.tar.gz -C /export/server/

（3）修改 apache-flume-1.9.0-bin 的名称为 flume

[atguigu@hadoop102 module]$ mv /export/server/apache-flume-1.9.0-bin

/export/server/flume

（4）将 lib 文件夹下的 guava-11.0.2.jar 删除以兼容 Hadoop 3.1.3

[atguigu@hadoop102 lib]$ rm /export/server/flume/lib/guava-11.0.2.jar

5)配置环境变量

vim /etc/profile

保存退出后，刷新profile

source /etc/profile

6、验证版本

#查看版本在flume下的bin目录下查看

[root@hadoop bin]# ./flume-ng version

flume-ng flume-ng.cmd flume-ng.ps1<strong>

</strong>

出现以下信息，表示安装成功

Flume 1.6.0

Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080

Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015

From source with checksum b29e416802ce9ece3269d34233baf43f

7，修改配置文件

/flume/conf目录下:
1.flume-env.sh.template
修改flume-env.sh.template重命名为flume-env.sh;

mv flume-env.sh.template flume-env.sh

2.cd flume-env.sh

修改java环境变量为:
export JAVA_HOME=/export/server/jdk1.8.0_212

Source flume-env.sh

9.flume监听验证

如果显示使用jps后查看进程号。然后kill -9 删除该进程

1、检查telnet是否已安装：

rpm -qa telnet

2、有输出说明已安装，如果没有输出则没有安装，使用yum install telnet进行安装

3、检查telnet-server是否已安装：

rpm -qa telnet-server

有输出说明已安装，如果没有输出则没有安装，使用yum install telnet-server进行安装

4、如果进行了telnet-server安装操作，且已成功安装，则由于telnet服务是由xinetd守护的，所以我们需要重新启动xinetd

service xinetd restart

5、如果报错：Redirecting to /bin/systemctl restart xinetd.service

1）首先，检查服务器已安装的tftp-server
        使用命令:rpm -qa | grep tftp-server
        如果存在已安装的tftp这里会列出来
    2)安装tftp-server 和 xinetd
        使用如下的命令，进行相应服务的安装:
        $yum -y install tftp-server
        $yum -y install xinetd
    3）修改tftp配置文件
    使用如下命令:
        $vi /etc/xinetd.d/tftp 打开配置文件
        service tftp
        {
            socket_type        = dgram
            protocol        = udp
            wait            = yes
            user            = root
            server        = /usr/sbin/in.tftpd
            server_args        = -s /var/lib/tftpboot
            disable        = no //需要修改的地方,初始时刻为yes
            per_source        = 11
            cps            = 100 2
            flags            = IPv4
        }
    4）重起服务
        使用如下命令进行服务的重新启动
        $/bin/systemctl restart xinetd.service
        如果没有效果，使用如下命令
        $/bin/systemctl enable xinetd.service //开启服务
        $/bin/systemctl start xinetd.service //启动服务
        查看服务启动状况
        $ps aux | grep xinetd 或者 $ps -ef|grep xinetd 或者 ps -a | grep tftp
    5）可能出现的问题
        5.1）在启动 xinetd.service 时提示
            Redirecting to /bin/systemctl restart xinetd.service
            Failed to issue method call: Unit xinetd.service failed to load: No such file or directory.
            说明系统没有安装 xinetd,需要使用 yum -y instal xinetd.service进行服务的安装
        5.2）在启动xinetd.service时出现:
            Redirecting to /bin/systemctl restart xinetd.service
            可能启动的命令是systemctl restart xinetd.service
            以上是我安装tftp的步骤及遇到的一些问题,可能读者这自己安装的过程中有其它的一些问题,但是问题应该不大

6）xinetd启动成功，可以查看xinetd的运行情况

netstat -tnlp

/创建一个Flume配置文件

$ cd flume/

$ mkdir example

$ cp conf/flume-conf.properties.template example/netcat.conf

进入到/home/hadoop/flume/example/ 下的netcat.conf 文件进行修改

$ cd /home/hadoop/flume/example/

$ vi netcat.conf

修改如下（配置netcat.conf用于实时获取另一终端输入的数据）：

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind =master

a1.sources.r1.port = 4444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel that buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

运行FlumeAgent，监听本机的44444端口

flume-ng agent -c conf -f netcat.conf -n a1 -Dflume.root.logger=INFO,console

效果如图所示：

打开另一终端，通过telnet登录localhost的44444，输入测试数据

$ telnet localhost 44444

如图所示，证明启动成功

查看flume收集数据情况

Flume简单案例

采集目录到HDFS

采集需求：服务器的某特定目录下，会不断产生新的文件，每当有新文件出现，就需要把文件采集到HDFS中去

根据需求，首先定义以下3大要素

采集源，即source——监控文件目录 : spooldir
下沉目标，即sink——HDFS文件系统 : hdfs sink
source和sink之间的传递通道——channel，可用file channel 也可以用内存channel

配置文件编写：

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

##注意：不能往监控目中重复丢同名文件

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /root/logs

a1.sources.r1.fileHeader = true

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

Channel参数解释：

capacity：默认该通道中最大的可以存储的event数量

trasactionCapacity：每次最大可以从source中拿到或者送到sink中的event数量

采集文件到HDFS

采集需求：比如业务系统使用log4j生成的日志，日志内容不断增加，需要把追加到日志文件中的数据实时采集到hdfs

根据需求，首先定义以下3大要素

采集源，即source——监控文件内容更新 : exec ‘tail -F file’
下沉目标，即sink——HDFS文件系统 : hdfs sink
Source和sink之间的传递通道——channel，可用file channel 也可以用内存channel

配置文件编写：

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/logs/test.log

a1.sources.r1.channels = c1

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

参数解析：

rollInterval

默认值：30

hdfs sink间隔多长将临时文件滚动成最终目标文件，单位：秒；

如果设置成0，则表示不根据时间来滚动文件；

注：滚动（roll）指的是，hdfs sink将临时文件重命名成最终目标文件，并新打开一个临时文件来写入数据；

rollSize

默认值：1024

当临时文件达到该大小（单位：bytes）时，滚动成目标文件；

如果设置成0，则表示不根据临时文件大小来滚动文件；

rollCount

默认值：10

当events数据达到该数量时候，将临时文件滚动成目标文件；

如果设置成0，则表示不根据events数据来滚动文件；

round

默认值：false

是否启用时间上的“舍弃”，这里的“舍弃”，类似于“四舍五入”。

roundValue

默认值：1

时间上进行“舍弃”的值；

roundUnit

默认值：seconds

时间上进行“舍弃”的单位，包含：second,minute,hour

示例：

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

当时间为2015-10-16 17:38:59时候，hdfs.path依然会被解析为：

/flume/events/20151016/17:30/00

因为设置的是舍弃10分钟内的时间，因此，该目录每10分钟新生成一个。

笙念&

关注

2
点赞
踩
13

收藏

觉得还不错? 一键收藏
1
评论
flume安装及其配置

flume安装及其配置详细
复制链接

扫一扫

专栏目录

flume安装及其配置

“相关推荐”对你有帮助么？