参考flume官网文档:http://flume.apache.org/FlumeDeveloperGuide.html#rpc-client-interface
看过我第一篇文章的同学应该已经知道在cdh5上配置flume的方法了,这里直接讲rpc-client模式下的自定义数据采集模式的简单搭建
1. 基于flume-sdk编写rpc-client数据采集程序
程序中引用的jar包,在此不再提供了,如果使用maven开发的很简单:
只需引入一下pom坐标即可
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-sdk</artifactId>
<version>1.6.0</version>
</dependency>
client代码如下:
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;
import java.nio.charset.Charset;
public class MyApp {
public static void main(String[] args) {
MyRpcClientFacade client = new MyRpcClientFacade();
// Initialize client with the remote Flume agent's host and port
client.init("host.example.org", 41415);
// Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
}
client.cleanUp();
}
}
class MyRpcClientFacade {
private RpcClient client;
private String hostname;
private int port;
public void init(String hostname, int port) {
// Setup the RPC connection
this.hostname = hostname;
this.port = port;
this.client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}
public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}
}
public void cleanUp() {
// Close the RPC connection
client.close();
}
}
2. 考虑安全性修改端口,并测试连通性
- 1> 考虑安全性,flume在cdh中的默认http端口是41414,我们不能使用这个,如果使用了,/var/log/flume-ng/flume-cmf-flume-AGENT-cdh4.log 日志中可以看到报错
org.apache.flume.FlumeException: Failed to set up server socket
Caused by: java.net.BindException: Address already in use
那我们选择一个 41415吧,注意上述程序中的端口号要对应修改成41415
- 2> 测试运行flume的程序所在主机host在41415端口能否连通
在windows上打开dos命令行窗口输入命令:telnet hostName/ip port 例如连接我的 telnet 192.168.1.100 41415 如下图则说明能连上
如果不能连通,请检测防火墙等配置,自行疏通端口访问
3. 在cdh中配置 conf文件
根据angent名称,即Agent default group 配置项中的名称 配置conf文件(此处如有操作问题,请参考 flume学习笔记(一)
- 配置文件内容
# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources = avro-source1
tier1.channels = channel1
tier1.sinks = sink1
# For each source, channel, and sink, set standard properties netcat source
# Define an Avro source called avro-source1 on tier1 and tell it
# to bind to 0.0.0.0:41415. Connect it to channel channel1.
tier1.sources.avro-source1.type = avro
tier1.sources.avro-source1.bind = cdh4
tier1.sources.avro-source1.port = 41415
tier1.sources.avro-source1.threads = 5
# Describe the channel
tier1.channels.channel1.type = memory
tier1.sources.avro-source1.channels = channel1
tier1.sinks.sink1.channel = channel1
# Describe the sink
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /user/hdfs/test/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.filePrefix=test_flume
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.rollInterval=0
# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100
特别提示:
tier1.sources.avro-source1.bind = cdh4
这里绑定的主机一定要使用/etc/hosts文件中绑定的主机名,主机名对应的内外网ip可自动切换,因此不会出错。
如果你的集群式内网集群,和你的开发机器、数据源所在机器在同一个子网内,那你写ip可能不会出错,如果不是,请一定使用flume所在linux机器的hosts文件中绑定的主机名
4. 运行java 编写的flume cilent程序
打开hdfs 的/user/hdfs/test/文件夹查看是否有文件吧