在agent启动时,会启动Channel,SourceRunner,SinkRunner,比如在org.apache.flume.agent.embedded.EmbeddedAgent类的doStart方法中:
private void doStart() {
boolean error = true;
try {
channel.start(); //调用Channel.start启动Channel
sinkRunner.start(); //调用SinkRunner.start启动SinkRunner
sourceRunner.start(); //调用SourceRunner.start启动SourceRunner
supervisor.supervise(channel,
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
supervisor.supervise(sinkRunner,
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
supervisor.supervise(sourceRunner,
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
error = false;
.....
SourceRunner目前有两大类PollableSourceRunner和EventDrivenSourceRunner,分别对应PollableSource和EventDrivenSource,PollableSource相关类需要外部驱动来确定source中是否有消息可以使用,而EventDrivenSource相关类不需要外部驱动,自己实现了事件驱动机制,目前常见的Source类都属于EventDrivenSource类型。在org.apache.flume.conf.source.SourceType中定义了常见的Source类:
OTHER(null),
SEQ("org.apache.flume.source.SequenceGeneratorSource"),
NETCAT("org.apache.flume.source.NetcatSource"),
EXEC("org.apache.flume.source.ExecSource"),
AVRO("org.apache.flume.source.AvroSource"),
SYSLOGTCP("org.apache.flume.source.SyslogTcpSource"),
MULTIPORT_SYSLOGTCP("org.apache.flume.source.MultiportSyslogTCPSource"),
SYSLOGUDP("org.apache.flume.source.SyslogUDPSource"),
SPOOLDIR("org.apache.flume.source.SpoolDirectorySource"),
HTTP("org.apache.flume.source.http.HTTPSource");
可以由org.apache.flume.source.DefaultSourceFactory获取其对应的实例.
SourceRunner则负责启动Source,比如在org.apache.flume.source.EventDrivenSourceRunner的start方法:
public void start() {
Source source = getSource(); //通过getSource获取Source对象
ChannelProcessor cp = source.getChannelProcessor(); //获取ChannelProcessor 对象
cp.initialize(); //调用ChannelProcessor.initialize方法
source.start(); //调用Source.start方法
lifecycleState = LifecycleState. START;
}
这里以execsource为例,配置source的类型为EXEC时,使用org.apache.flume.source.ExecSource.
主要原理是通过启动操作系统中的进程把每一行文本信息转换为Event,exec source不提供事务的特性(不能保证数据被成功发送到channel中),execsource关心的是标准输出,标准错误输出会被忽略(虽然可以打印到log中,但不会作为Event发送),可以设置restart的特性让source在退出时自动重启
主要方法分析:
1.configure方法用来设置相关参数
参数项:
command 执行的命令
restart 在命令退出后是否自动重启,默认是false
restartThrottle 重启命令的等待时间,默认是10s
logStderr 是否记录错误日志(只会放到日志中,不会做为Event),默认是false
batchSize 一次向channel发送的event的数量,批量发送的功能,默认是20
charset 字符编码设置,默认是UTF-8
2.start方法,SourceRunner的start方法会调用这个start方法:
public void start() {
logger.info( "Exec source starting with command:{}" , command );
executor = Executors. newSingleThreadExecutor(); // 初始化一个单线程的线程池,private ExecutorService executor;
counterGroup = new CounterGroup(); //初始化性能计数器
runner = new ExecRunnable( command, getChannelProcessor(), counterGroup,
restart, restartThrottle, logStderr , bufferCount , charset ); //实例化一个ExecRunnable线程
// FIXME : Use a callback-like executor / future to signal us upon failure.
runnerFuture = executor.submit( runner); //private Future<?> runnerFuture;
super.start(); //设置状态为start,lifecycleState = LifecycleState.START;
logger.debug( "Exec source started");
}
3.核心是一个实现Runnable接口的线程类ExecRunnable,用于启动读取数据的进程(在一个do...while循环中)
public void run() {
do {
String exitCode = "unknown";
BufferedReader reader = null;
try {
String[] commandArgs = command.split("\\s+"); //由命令生成字符串数组
process = new ProcessBuilder(commandArgs).start(); //生成一个Process对象,启动命令进程
reader = new BufferedReader(
new InputStreamReader(process.getInputStream(), charset)); //读取标准输出至缓冲区对象
// StderrLogger dies as soon as the input stream is invalid
StderrReader stderrReader = new StderrReader(new BufferedReader(
new InputStreamReader(process.getErrorStream(), charset)), logStderr); //StderrReader线程用于记录标准错误日志
stderrReader.setName("StderrReader-[" + command + "]");
stderrReader.setDaemon(true);
stderrReader.start();
/*
while((line = input.readLine()) != null) {
if(logStderr) {
logger.info("StderrLogger[{}] = '{}'", ++i, line); //日志只会写到log中,不会转换为Event
}
}
*/
String line = null;
List<Event> eventList = new ArrayList<Event>();
while ((line = reader.readLine()) != null) { //循环读取缓冲区中的内容
counterGroup.incrementAndGet("exec.lines.read");
eventList.add(EventBuilder.withBody(line.getBytes(charset))); //使用EventBuilder.withBody转换成Event并放入eventList中
if(eventList.size() >= bufferCount) { //达到batchSize的设置时调用ChannelProcessor.processEventBatch处理Event
channelProcessor.processEventBatch(eventList);
eventList.clear();
}
}
if(!eventList.isEmpty()) {
channelProcessor.processEventBatch(eventList);
}
}
....
} while(restart); //如果设置了restart,会在process退出后重新运行do代码块,否则退出
}
转载于:https://blog.51cto.com/caiguangguang/1618373