针对flume的客户端编程

本文介绍了如何实现Flume的客户端编程,包括直接向source发送数据的思路和自定义source通过RPC通信的方式。详细讲解了Avro RPC客户端的配置与使用,包括故障备援和负载均衡的实现。此外,还提到了Flume的嵌入式agent API,允许用户在应用中嵌入轻量级的Flume代理,但并非所有source、sink和channel都适用。
摘要由CSDN通过智能技术生成

flume中source是数据源,想数据源传递数据有两种实现思路There are two ways of achieving this. The first option is to create a custom client that communicates with one of Flume’s existing Sources like AvroSource or SyslogTcpSource. Here the client should convert its data into messages understood by these Flume Sources. The other option is to write a custom Flume Source that directly talks with your existing client application using some IPC or RPC protocol, and then converts the client data into Flume Events to be sent downstream. Note that all events stored within the Channel of a Flume agent must exist as Flume Events.
思路1:实现一个client可以向source直接发送数据
思路2:自己实现一个source,可以直接通过rpc通讯实现与自己写的客户端直接传输数据。

客户端1:
As of Flume 1.4.0默认source是用的协议是Avro的RPC协议,thrift协议也是支持的。在conf配置文件中对source中type的配置要与应用程序中获取client时使用的传输协议要一致。例如source在配置文件中设备的type类型为Avro就要使用RpcClientFactory.getDefaultInstance(hostname, port)获取RpcClient。如果在source配置文件中配置的时thrift,就要用函数RpcClientFactory.getThriftInstance(hostname, port);获取RpcClient。记住配置文件中的协议类型要与客户端的协议类型一致。
例子:
public class FirstRPCClient {

private static RpcClient client = null;
private static final String ip="worker1";
private static final int port=41414;
public static void main(String [] args){
    // Initialize client with the remote Flume agent's host and port
    client = RpcClientFactory.getDefaultInstance(ip, port);
    // Send 10 events to the remote Flume agent. That agent should be
    // configured to listen with an AvroSource.
    String sampleData = "Hello Flume!";
    for (int i = 0; i < 10; i++) {
        sendDataToFlume(sampleData);
    }
    client.close();
}
private static void sendDataToFlume(String data) {
    // Create a Flume Event object that encapsulates the sample data
    Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
    // Send the event
    try {
        client.append(event);
    } catch (EventDeliveryException e) {
        // clean up and recreate the client
        client.close();
        client = null;
        //client = RpcClientFactory.getDefaultInstance(hostname, port);
        // Use the following method to create a thrift client (instead of the above line):
        client = RpcClientFactory.getThriftInstance(ip, port);
    }
}

}
配置文件:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

客户端2:失效备援
这个类包封装了Avro RPCclient的类默认提供故障处理能力。hosts采用空格分开host:port所代表的flume agent,构成一个故障处理组。这Failover RPC Client目前不支持thrift。如果当前选择的host agent有问题,这个failover client会自动负载到组中下一个host中。
这边代码设三个host用于故障转移,这里偷懒,用同一个主机的3个端口模拟。代码还是将Hello Flume 发送10遍给第一个flume代理,当第一个代理故障的时候,则发送给第二个代理,以顺序进行故障转移。

下面是代理配置沿用之前的那个,并对配置文件进行拷贝,

cp avro_client_case20.conf avro_client_case21.conf

cp avro_client_case20.conf avro_client_case22.conf

分别修改avro_client_case21.conf与avro_client_case22.conf中的

a1.sources.r1.port= 50001 与a1.sources.r1.port = 50002

敲命令

flume-ng agent -c conf -f conf/avro_client_case20.conf-n a1 -Dflume.root.logger=INFO,console

flume-ng agent -c conf -f conf/avro_client_case21.conf-n a1 -Dflume.root.logger=INFO,console

flume-ng agent -c conf -f conf/avro_client_case22.conf-n a1 -Dflume.root.logger=INFO,console

具体代码实现如下所示:
public class SecondRPCClient {

private static RpcClient client=null;
private static Properties props=null;
public static void main(String []args) throws Exception{
    // Setup properties for the failover
    props = new Properties();
    props.put("client.type", "default_failover");
    // List of hosts (space-separated list of user-chosen host aliases)
    props.put("hosts", "h1 h2 h3");

    // host/port pair for each host alias
    String host1 = "worker1:41414";
    String host2 = "worker1:41415";
    String host3 = "worker1:41416";
    props.put("hosts.h1", host1);
    props.put("hosts.h2", host2);
    props.put("hosts.h3", host3);

    // create the client with failover properties
    client = RpcClientFactory.getInstance(props);
    for(int i=0;i<100;i++){
        Thread.sleep(1000);
        sendDataToFlume("hello flume"+i);
    }
    client.close();
}
private static void sendDataToFlume(String data) {
    // Create a Flume Event object that encapsulates the sample data
    Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));

    // Send the event
    try {
        client.append(event);
    } catch (EventDeliveryException e) {
        // clean up and recreate the client
        client.close();
        client = null;
        //client = RpcClientFactory.getDefaultInstance(hostname, port);
        // Use the following method to create a thrift client (instead of the above line):
        client = RpcClientFactory.getInstance(props);
    }
}

}

客户端3:LoadBalancing RPC client
要点1:The LoadBalancing RPC Client currently does not support thrift.
要点2:If backoff is enabled then the client will temporarily blacklist hosts that fail, causing them to be excluded from being selected as a failover host until a given timeout。
客户端4:Secure RPC client - Thrift¶

客户端3,4实现具体参考下边链接即可
http://flume.apache.org/releases/content/1.6.0/FlumeDeveloperGuide.html

Embedded agent
可以嵌入到客户端内使用,但是agent是有要求的,不是所有的source,channel,sink都可以的。其中 File Channel and Memory Channel是被允许的。Avro Sink是惟一被允许的sink

Flume has an embedded agent api which allows users to embed an agent in their application. This agent is meant to be lightweight and as such not all sources, sinks, and channels are allowed. Specifically the source used is a special embedded source and events should be send to the source via the put, putAll methods on the EmbeddedAgent object. Only File Channel and Memory Channel are allowed as channels while Avro Sink is the only supported sink. Interceptors are also supported by the embedded agent.
Note: The embedded agent has a dependency on hadoop-core.jar.

除此之外flume也可以不使用系统提供的source,channel,sink,也可以通过代码自定义这些组件,使用时要加上transaction机制。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值