使用Flume作为Spark Streaming数据源

最新推荐文章于 2021-03-04 17:38:10 发布

Mongolenlerche

最新推荐文章于 2021-03-04 17:38:10 发布

阅读量2.7k

点赞数

文章标签： spark 大数据 flume

本文链接：https://blog.csdn.net/weixin_45607431/article/details/111302556

版权

本文档详细介绍了如何在CentOS环境下，使用Flume 1.7.0版本作为Spark Streaming 2.4.7的数据源。通过实验步骤，包括Flume的安装、netcat数据源测试及配置Flume数据源为Spark Streaming，以及Spark环境的准备，如下载依赖jar包。最后，编写并打包Spark程序，通过sbt进行编译，测试程序运行效果，展示了一个完整的工作流程。

摘要由CSDN通过智能技术生成

使用Flume作为Spark Streaming数据源

一、实验目的

（1）通过实验学习日志采集工具Flume的安装和使用方法；

（2）掌握采用Flume作为Spark Streaming数据源的编程方法。

二、实验平台

操作系统：CentOS

Spark版本：2.4.7

Flume版本：1.7.0

三、实验内容和要求

1.安装Flume

解压安装包

 tar -zxvf apache-flume-1.7.0-bin.tar.gz -C /opt
 ln -sv apache-flume-1.7.0-bin.tar.gz apache-flume

配置环境变量

vi ~/.bash_profile

修改内容如下：

export FLUME_HOME=/opt/apache-flume-1.7.0-bin
export PATH=$PATH:$FLUME_HOME/bin

source ~/.bash_profile

配置flume-env.sh文件

cd apache-flume-1.7.0-bin/conf/
cp flume-env.sh.template flume-env.sh
vi flume-env.sh

修改内容如下：

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.272.b10-1.el8_2.x86_64
export HADOOP_HOME=/opt/hadoop-2.7.4

版本验证

flume-ng version

在这里插入图片描述

2. 使用netcat数据源测试Flume

创建一个Flume配置文件

cd apache-flume-1.7.0-bin
mkdir example
cp conf/flume-conf.properties.template example/netcat.conf

配置netcat.conf用于实时获取另一终端输入的数据

vi example/netcat.conf

修改内容如下：

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel that buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

在一个Linux终端（这里称为“Flume终端”）中，启动Flume

flume-ng agent -c conf -f example/netcat.conf -n a1 -Dflume.root.logger=INFO,console

在另一个终端（这里称为“Telnet终端”）中，输入命令“telnet localhost 44444”，如果报错：

bash: telnet: command not found

说明容器中没有telnet，需要下载：

yum list telnet*              #列出telnet相关的安装包
yum install telnet-server      #安装telnet服务
yum install telnet.*           #安装telnet客户端

输入 "ss -tnl"查看：

在这里插入图片描述

此时再次输入命令：telnet localhost 44444

然后，在Telnet终端中输入任何字符，让这些字符可以顺利地在Flume终端中显示出来。

Telnet终端

在这里插入图片描述

Flume终端

在这里插入图片描述

成功！

3.使用Flume作为Spark Streaming数据源

Flume是非常流行的日志采集系统，可以作为Spark Streaming的高级数据源。请把Flume Source设置为netcat类型，从终端上不断给Flume Source发送各种消息，Flume把消息汇集到Sink，这里把Sink类型设置为avro，由Sink把消息推送给Spark Streaming，由自己编写的Spark Streaming应用程序对消息进行处理。

配置Flume数据源

执行如下命令新建一个Flume配置文件flume-to-spark.conf：

cd apache-flume-1.7.0-bin
cd conf
vi flume-to-spark.conf

内容如下：

#flume-to-spark.conf: A single-node Flume configuration
        # Name the components on this agent
        a2.sources = r2                                                                           
        a2.sinks = k2                                                                             
        a2.channels = c2

        # Describe/configure the source
        a2.sources.r2.type = netcat                                                               
        a2.sources.r2.bind = localhost

最低0.47元/天解锁文章

Mongolenlerche

关注

0
点赞
踩
18

收藏

觉得还不错? 一键收藏
1
评论
使用Flume作为Spark Streaming数据源

使用Flume作为Spark Streaming数据源一、实验目的（1）通过实验学习日志采集工具Flume的安装和使用方法；（2）掌握采用Flume作为Spark Streaming数据源的编程方法。二、实验平台操作系统：CentOSSpark版本：2.4.7Flume版本：1.7.0三、实验内容和要求1.安装Flume解压安装包 tar -zxvf apache-flume-1.7.0-bin.tar.gz -C /opt ln -sv apache-flume-1.7.0-bin
复制链接

扫一扫