flume学习笔记(二):使用flume-sdk 实现数据源自定义client发送数据到flume

3 篇文章 0 订阅
2 篇文章 1 订阅

参考flume官网文档:http://flume.apache.org/FlumeDeveloperGuide.html#rpc-client-interface
看过我第一篇文章的同学应该已经知道在cdh5上配置flume的方法了,这里直接讲rpc-client模式下的自定义数据采集模式的简单搭建

1. 基于flume-sdk编写rpc-client数据采集程序

程序中引用的jar包,在此不再提供了,如果使用maven开发的很简单:
只需引入一下pom坐标即可

<dependency>
			<groupId>org.apache.flume</groupId>
			<artifactId>flume-ng-sdk</artifactId>
			<version>1.6.0</version>
		</dependency>

client代码如下:

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;
import java.nio.charset.Charset;

public class MyApp {
  public static void main(String[] args) {
    MyRpcClientFacade client = new MyRpcClientFacade();
    // Initialize client with the remote Flume agent's host and port
    client.init("host.example.org", 41415);

    // Send 10 events to the remote Flume agent. That agent should be
    // configured to listen with an AvroSource.
    String sampleData = "Hello Flume!";
    for (int i = 0; i < 10; i++) {
      client.sendDataToFlume(sampleData);
    }

    client.cleanUp();
  }
}

class MyRpcClientFacade {
  private RpcClient client;
  private String hostname;
  private int port;

  public void init(String hostname, int port) {
    // Setup the RPC connection
    this.hostname = hostname;
    this.port = port;
    this.client = RpcClientFactory.getDefaultInstance(hostname, port);
    // Use the following method to create a thrift client (instead of the above line):
    // this.client = RpcClientFactory.getThriftInstance(hostname, port);
  }

  public void sendDataToFlume(String data) {
    // Create a Flume Event object that encapsulates the sample data
    Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));

    // Send the event
    try {
      client.append(event);
    } catch (EventDeliveryException e) {
      // clean up and recreate the client
      client.close();
      client = null;
      client = RpcClientFactory.getDefaultInstance(hostname, port);
      // Use the following method to create a thrift client (instead of the above line):
      // this.client = RpcClientFactory.getThriftInstance(hostname, port);
    }
  }

  public void cleanUp() {
    // Close the RPC connection
    client.close();
  }

}

2. 考虑安全性修改端口,并测试连通性

  • 1> 考虑安全性,flume在cdh中的默认http端口是41414,我们不能使用这个,如果使用了,/var/log/flume-ng/flume-cmf-flume-AGENT-cdh4.log 日志中可以看到报错
    org.apache.flume.FlumeException: Failed to set up server socket
    Caused by: java.net.BindException: Address already in use

那我们选择一个 41415吧,注意上述程序中的端口号要对应修改成41415

  • 2> 测试运行flume的程序所在主机host在41415端口能否连通
    在windows上打开dos命令行窗口输入命令:telnet hostName/ip port 例如连接我的 telnet 192.168.1.100 41415 如下图则说明能连上
    在这里插入图片描述

如果不能连通,请检测防火墙等配置,自行疏通端口访问

3. 在cdh中配置 conf文件

根据angent名称,即Agent default group 配置项中的名称 配置conf文件(此处如有操作问题,请参考 flume学习笔记(一)
在这里插入图片描述

  • 配置文件内容
# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = avro-source1
tier1.channels = channel1
tier1.sinks    = sink1


# For each source, channel, and sink, set standard properties netcat source


# Define an Avro source called avro-source1 on tier1 and tell it
# to bind to 0.0.0.0:41415. Connect it to channel channel1.
tier1.sources.avro-source1.type     = avro
tier1.sources.avro-source1.bind     = cdh4
tier1.sources.avro-source1.port     = 41415
tier1.sources.avro-source1.threads = 5

# Describe the channel
tier1.channels.channel1.type   = memory
tier1.sources.avro-source1.channels = channel1
tier1.sinks.sink1.channel = channel1

# Describe the sink
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /user/hdfs/test/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.filePrefix=test_flume
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.rollInterval=0


# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100

特别提示:

tier1.sources.avro-source1.bind = cdh4
这里绑定的主机一定要使用/etc/hosts文件中绑定的主机名,主机名对应的内外网ip可自动切换,因此不会出错。
如果你的集群式内网集群,和你的开发机器、数据源所在机器在同一个子网内,那你写ip可能不会出错,如果不是,请一定使用flume所在linux机器的hosts文件中绑定的主机名

4. 运行java 编写的flume cilent程序

打开hdfs 的/user/hdfs/test/文件夹查看是否有文件吧

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值