flume-1.7.0 简单使用_component type: source, name: r1 started后不动了-CSDN博客

本文通过实战案例介绍了 Flume 的基本使用方法，包括配置文件的编写、Agent 的启动及消息发送流程，同时还演示了如何利用 Flume 将数据导入 HDFS 和 Kafka。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在上一篇中，我们安装了 flume-ng，这一篇我们就来简单使用一下。

这里写图片描述

官网上是这么介绍的，我们需要指定一个配置文件，需要定义一个 agent 的名称，然后我们就可以使用 flume-ng 命令来启动了。

1 编写配置文件

我们先拿官网上的例子来跑一下看看，就使用 example.conf 文件：

[root@master conf]# pwd
/usr/hadoop/flume-1.7.0-bin/conf
[root@master conf]# vi example.conf 

# example.conf: A single-node Flume configuration

# Name the components on this agent
# 定义一个 agent 的元素

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
# 配置 source

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
# 配置 sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
# 定义 channel

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
# 用 channel 连接起来 source 和 sink

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

这里我简单写了一下注释，这可不是翻译过来的昂，大家看的时候别被我误导了。具体概念大家还是看官网吧。
其中有四个概念：
这里写图片描述
这是官网上给的示意图，我们可以这样理解：
WebServer 想给 HDFS 送点东西 (source) ，于是就找了 flume 这个 agent(代理) ，然后 WebServer 把 source 给了 agent , agent 拿到之后，用它自己的手段(channel)，可能是”物流“，然后到了离 HDFS 最近的”快递分拣点“(sink)，把东西给了 HDFS。
这张图只是示意图，别不是对于 example.conf 的解释，希望大家别想多了。

1.1 source

别听我胡扯，我只是为了方便理解，自己意淫的。
那么，对于这个 example.conf 的配置文件，我们定义了一个叫 a1 的 agent，然后 source 源配置的是 netcat 类型，对于 netcat source 需要配置的内容官网上写的很清楚：
这里写图片描述

黑色加粗的几项是必须配置的，对于其他几项是可选项。

1.2 channel

接下来是 channel，也就是我们要选那一种”物流“，这里我们用的是 memory，我们需要配置的是：
这里写图片描述

1.3 sink

我们的”物流分拣点“，sink 我们配置的是 logger，需要配置几项是：
这里写图片描述

可能你已经被我误导了，对于他们真正的解释还是看官网，获取你对它们的理解，把这些东西转换成为你能够理解的东西就行了。

2 启动 agent

2.1 启动

上面我们已经说过了，使用 flume-ng 命令启动。
这里写图片描述

具体参数看上图。

[root@master conf]# pwd
/usr/hadoop/flume-1.7.0-bin/conf

[root@master conf]# flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

Info: Including Hadoop libraries found via (/usr/hadoop/hadoop-2.6.4/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/usr/hadoop/hbase-1.2.3/bin/hbase) for HBASE access
...

16/11/18 19:34:27 INFO node.Application: Starting Channel c1
16/11/18 19:34:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/11/18 19:34:27 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/11/18 19:34:27 INFO node.Application: Starting Sink k1
16/11/18 19:34:27 INFO node.Application: Starting Source r1
16/11/18 19:34:27 INFO source.NetcatSource: Source starting
16/11/18 19:34:27 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

想必大家也都注意到了，我所在的目录是 conf，因为我们需要指定一个配置文件，需要它的绝对路径。然后看到上面的情况就表示我们的名为 a1 的 agent 启动成功了。由于我们配置的 sink 是 logger ，并且指定了一些列参数，把内容输出到我们的控制台。

2.2 发送消息

接下来我们就可以再开一个终端，在这个终端上执行下面一系列命令：
这里写图片描述

可能有些朋友的Linux上会提示 command not found，只要安装一下 telnet 就可以了。这里我们可以看到一个”Connection refused“，是因为我们的 /etc/hosts 文件中 localhost 对应的 ”::1…“这样形式的 ip 没有识别成功，它尝试了”127.0.0.1“就连上了。具体的还是要看关于网络这块儿的内容，我就不细讲了。

这个时候，我们可以在这儿输入一些东西：
这里写图片描述

然后我们回到之前运行着 agent 的那个终端：
这里写图片描述

会看到多了这样一行内容，可能有时候这一行显示的内容比我们输入的内容要少，并不是没有接收到，是因为超过了它能显示的长度，给省略掉了。

这个时候应该就有所体会了，flume 是”很多个形容词“的
这里写图片描述

日志采集系统。

我们再来写几个实例，来体会一下。

3. avro ⇒ hdfs

我们的 source 是 avro，sink 是 hdfs，（我这种说法严格来说是不正规的，但是我不知道怎么说你们能够理解，暂时就先这样认为）。
那么，先找找看 avro source 需要配置的是什么。

3.1 配置 avro source

这里写图片描述
我们需要配置有四项，并且下面也给出了示例。

[root@master conf]# vi AvroHDFS.conf
a1.sources = r1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = master
a1.sources.r1.port = 4141

3.2 配置 channel

这次我们还是使用 memory：
这里写图片描述

我们在 AvroHDFS.conf 文件中追加：

[root@master conf]# vi AvroHDFS.conf
a1.sources = r1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = master
a1.sources.r1.port = 4141

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

3.3 配置 sink

我们是向 HDFS 上导数据，所以我们使用 hdfs sink：

这里写图片描述
这部分图片太长了，放上来大家也看不清，于是我就只截了示例，大家还是去官网上看看怎么配置吧。

我们接着在 AvroHDFS.conf 文件中追加：

[root@master conf]# vi AvroHDFS.conf
a1.sinks = k1
a1.sources = r1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = master
a1.sources.r1.port = 4141

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /fromflume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events.
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true

这样我们就定义好了一个从 avro 到 hdfs 的配置文件，我们可以启动 agent 了。

3.4 启动 agent

这次我们需要先运行起来 hadoop 集群，不然是会失败的。

[root@master conf]# flume-ng agent --conf conf --conf-file AvroHDFS.conf --name a1 -Dflume.root.logger=INFO,console

Info: Including Hadoop libraries found via (/usr/hadoop/hadoop-2.6.4/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/usr/hadoop/hbase-1.2.3/bin/hbase) for HBASE access
Info: Including Hive libraries found via (/usr/hadoop/apache-hive-2.1.0-bin) for Hive access
...

16/11/18 20:46:13 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
16/11/18 20:46:13 INFO source.AvroSource: Avro source r1 started.

这样就表示我们的 agent 启动成功了。不过接下来我们需要使用API 了，java程序如下：

import java.nio.charset.Charset;

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;

public class FlumeDemo {
    private String hostname;
    private int port;
    private RpcClient client;

    public FlumeDemo(String hostname,int port) {
        this.hostname = hostname;
        this.port = port;
        this.client = RpcClientFactory.getDefaultInstance(hostname, port);
    }

    public void sendMessage(String data){
        Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
        try {
            client.append(event);
        } catch (EventDeliveryException e) {
            e.printStackTrace();
        }
    }

    public void cleanUp(){
        client.close();
    }

    public static void main(String[] args) {
        FlumeDemo rpcClient = new FlumeDemo("master", 4141);

        String data = "testing ";

        for(int i=0;i<10;i++){
            rpcClient.sendMessage(data + i);
        }

        rpcClient.cleanUp();
    }
}

然后，运行我们的 java 程序，这个时候，观察我们的 agent 是什么状况：

...
16/11/18 20:46:13 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
16/11/18 20:46:13 INFO source.AvroSource: Avro source r1 started.

16/11/18 20:46:23 INFO ipc.NettyServer: [id: 0x33dfe52f, /192.168.38.1:64375 => /192.168.38.129:4141] OPEN
16/11/18 20:46:23 INFO ipc.NettyServer: [id: 0x33dfe52f, /192.168.38.1:64375 => /192.168.38.129:4141] BOUND: /192.168.38.129:4141
16/11/18 20:46:23 INFO ipc.NettyServer: [id: 0x33dfe52f, /192.168.38.1:64375 => /192.168.38.129:4141] CONNECTED: /192.168.38.1:64375
16/11/18 20:46:23 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
16/11/18 20:46:23 INFO ipc.NettyServer: [id: 0x33dfe52f, /192.168.38.1:64375 :> /192.168.38.129:4141] DISCONNECTED
16/11/18 20:46:23 INFO ipc.NettyServer: [id: 0x33dfe52f, /192.168.38.1:64375 :> /192.168.38.129:4141] UNBOUND
16/11/18 20:46:23 INFO ipc.NettyServer: [id: 0x33dfe52f, /192.168.38.1:64375 :> /192.168.38.129:4141] CLOSED
16/11/18 20:46:23 INFO ipc.NettyServer: Connection to /192.168.38.1:64375 disconnected.

16/11/18 20:46:24 INFO hdfs.BucketWriter: Creating /fromflume/events/16-11-18/2040/00/events..1479473183696.tmp
...

这个时候，我们可以看到，已经在往 HDFS 上写数据了，我们可以通过WebUI(通过浏览器访问：http://master:50070 )开看看 hdfs 上是不是多了 /fromflume/… 的文件夹
这里写图片描述

果然，我们的文件夹已经创建成功了，我们可以一级一级的进去，会看到：
这里写图片描述

文件已经成功写入，我们查看的时候肯定是各种乱码的…但是我们是已经成功写入了。

4 avro ⇒ kafka

结合我昨天写的，我们来写一下 flume 和 kafka 结合的一个例子。两者有点相似，都是”代理/中介“。

有了前面两个练习，那这个我就不写的那么详细了。

这里写图片描述
但是，对于这个 kafka sink 还真是有好多需要说的：

4.1 sinks.type

这个必须设置成”org.apache.flume.sink.kafka.KafkaSink“，你们说这个也是奇怪，这一项没有默认值，要设置的时候还 ” Must be set to org.apache.flume.sink.kafka.KafkaSink“。

4.2 kafka.topic

关于这一项，那里也有提到，大概意思就是，当 topic 在 event 的 header 中时，kafka 的 broker 中原来跟这个 topic 同名的就会被覆写。……我不想翻译了，我有点绕晕了。等我想好了再来补充这一部分。
这一块暂时还用不到，现在不理解也不打紧。先记住有这么一茬。

好了，我们来写一下 sink ：

[root@master conf]# vi AvroKafka.conf
      1 a1.sources = r1
      2 a1.sinks = k1
      3 a1.channels = c1
      4 
      5 a1.sources.r1.type = avro
      6 a1.sources.r1.bind = master
      7 a1.sources.r1.port = 44444
      8 
      9 
     10 a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
     11 a1.sinks.k1.kafka.bootstrap.servers = localhost:9092,localhost:9093
     12 a1.sinks.k1.kafka.topic = fromFlume 
     13 
     14 a1.channels.c1.type = memory
     15 a1.channels.c1.capacity = 1000
     16 a1.channels.c1.transactionCapacity = 100
     17 
     18 a1.sources.r1.channels = c1
     19 a1.sinks.k1.channel = c1

要 flume 和 kafka 结合，这次我们是让 kafka 消费 flume 发出去的数据。
那么，我们需要启动 kafka 的服务，并且创建一个消费者，为了演示，这次我们就在 master 上启动两个 broker 好了。

[root@master config]# pwd
/usr/hadoop/kafka_2.11-0.10.1.0/config
[root@master config]# kafka-server-start.sh server.properties &
[1] 10693
...
[root@master config]# kafka-server-start.sh server1.properties &
[2] 10962
...
[root@master config]# jps
11233 Jps
2593 ResourceManager
10962 Kafka
2692 NodeManager
10693 Kafka
3034 QuorumPeerMain
2171 NameNode
2269 DataNode
2446 SecondaryNameNode
[root@master config]#

[root@master config]# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic fromFlume --from-beginning

这个时候，我们就可以启动我们的 flume agent 了：

[root@master conf]# flume-ng agent --conf conf --conf-file AvroKafka.conf --name a1 -Dflume.root.logger=INFO,console

...

不知道大家有没有晕掉，反正我当初学的时候是晕了好一阵儿呢。^8^

接下来，我们还使用之前的 java 程序：
不过这时候的端口号需要修改一下，记住，这个端口号需要跟我们的 *.conf 文件中配置的端口号一致。


import java.nio.charset.Charset;

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;

public class FlumeDemo {
    private String hostname;
    private int port;
    private RpcClient client;

    public FlumeDemo(String hostname,int port) {
        this.hostname = hostname;
        this.port = port;
        this.client = RpcClientFactory.getDefaultInstance(hostname, port);
    }

    public void sendMessage(String data){
        Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
        try {
            client.append(event);
        } catch (EventDeliveryException e) {
            e.printStackTrace();
        }
    }

    public void cleanUp(){
        client.close();
    }

    public static void main(String[] args) {
        FlumeDemo rpcClient = new FlumeDemo("master", 44444);

        String data = "Hello World! ";

        for(int i=0;i<10;i++){
            rpcClient.sendMessage(data + i);
        }

        rpcClient.cleanUp();
    }
}

运行程序，然后，我们去到启动 kafka 消费者的那个终端上，我们会看到：

[2016-11-18 21:46:54,915] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2016-11-18 21:47:08,073] INFO [Group Metadata Manager on Broker 1]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
Hello World! 0
Hello World! 1
Hello World! 2
Hello World! 3
Hello World! 4
Hello World! 5
Hello World! 6
Hello World! 7
Hello World! 8
Hello World! 9