Apache Flume 1.9.0源码分析之设计思路（二）

最新推荐文章于 2024-03-29 09:32:46 发布

amosteeranmazz

最新推荐文章于 2024-03-29 09:32:46 发布

阅读量246

点赞数 1

文章标签： flume 系统架构 java

本文链接：https://blog.csdn.net/weixin_44417529/article/details/126878413

版权

上部分主要介绍了source、channel和sink这三个interface的设计思路。其中source主要是对软件外部的数据get到，在思路（一）中，该方法是利用ChannelProcessor类实现。以下将介绍ChannelProcessor类。

ChannelProcessor Class

基本分析

ChannelProcessor类在加载的时候通过用户指定进行配置。可得ChannelProcessor类需要进行配置，在Flume中配置采用实现Configurable接口实现。Configurable接口内容在后。

除了配置外，下面对于ChannelProcessor进行分析：

在Flume官方文档中，有一段描述：

A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source.

在叙述中，我们可以得到ChannelProcessor的设计目标是完成对external source和flume source的对应。即选择与外部channel相匹配的channel。这是ChannelProcessor要完成的一大功能。除此之外，在官方文档中还有描述如下。

Flume has the capability to modify/drop events in-flight. This is done with the help of interceptors. An interceptor can modify or even drop events based on any criteria chosen by the developer of the interceptor. Flume supports chaining of interceptors. This is made possible through by specifying the list of interceptor builder class names in the configuration.

在上述描述中，我们可以得到除了与外部channel相适应，还需要interceptor对数据进行预处理，这需要另外的一种数据结构。

以下将对ChannelProcessor中涉及的channelSelector和Interceptorchain进行解释。

在flume设置中，对于channelselector的处理将其完全暴露给用户，而对于interceptorchain来说对于数据的处理时逐个通过interceptor（具体实现暴露暴露给用户）这进一步明确两部分的类或接口的设计方法。即channelSelector使用interface，而interceptorchain的处理逻辑写为class形式，interceptor的实现暴露给用户。以下将介绍对应的interface和class设计思路。

Configurable Interface

对需要进行配置的部分进行配置的interface，其中context是配置的具体信息。

public void configure(Context context);

ChannelSelector Interface

上述已明确channelselector采用interface的原因。

针对channelselector的配置是由用户自定义，对其需要继承confiurable interface与namecomponent。

其对外暴露接口包括getRequiredChannels与getOptionalChannels与getAllchannels具体将在之后介绍。

其方法包括get与set方法

public void setChannels(List<Channel> channels);
public List<Channel> getRequiredChannels(Event event);
public List<Channel> getOptionalChannels(Event event);
public List<Channel> getAllChannels();

Interceptor Interface

interceptor有众多拦截器类型，可以自定义或使用flume给出的具体实现，拦截器操作的对象主要是事件。

拦截器的方法包括初始化，和关闭以及对事件和事件列表的拦截。

interceptor interface的配置如下

public Event intercept(Event event);
public List<Event> intercept(List<Event> events);
public void initialize();
public void close();

除了上述对interceptor的初始化、开启关闭外，interceptor是用户指定的，针对interceptor的配置根据不同interceptor的不同而不同，因此需要由用户完成conf的配置，由以下interface，

public interface Builder extends Configurable {
  public Interceptor build();
}

InterceptorChain Class

interceptorChain使用class的原因上述已经介绍。interceptorChain class主要完成多个interceptor的逻辑关联。其中有如下的方法。

private List<Interceptor> interceptors;
public InterceptorChain() {
  interceptors = Lists.newLinkedList();
}

public void setInterceptors(List<Interceptor> interceptors) {
  this.interceptors = interceptors;
}

interceptorChain在完成多个interceptor的逻辑对应的有初始化、关闭和intercept方法。

其中intercept方法如下：

public Event intercept(Event event) {
  for (Interceptor interceptor : interceptors) {
    if (event == null) {
      return null;
    }
    event = interceptor.intercept(event);
  }
  return event;
}

对于返回event的list集合的intercept方法如下：

public List<Event> intercept(List<Event> events) {
  for (Interceptor interceptor : interceptors) {
    if (events.isEmpty()) {
      return events;
    }
    events = interceptor.intercept(events);
    Preconditions.checkNotNull(events,
        "Event list returned null from interceptor %s", interceptor);
  }
  return events;
}

initialize与close函数如下

public void initialize() {
  Iterator<Interceptor> iter = interceptors.iterator();
  while (iter.hasNext()) {
    Interceptor interceptor = iter.next();
    interceptor.initialize();
  }
}

public void close() {
  Iterator<Interceptor> iter = interceptors.iterator();
  while (iter.hasNext()) {
    Interceptor interceptor = iter.next();
    interceptor.close();
  }
}

上述介绍了channelselector interface和interceptorchain的内容。现在我们具体讨论channelProcessor的具体实现。

首先channelprocessor完成的目标是对channel进行适配，然后通过interceptor将数据过滤最后通过选定的channel将数据传送。针对上述要求以及channelselector和interceptorchain的数据结构，对channelprocessor进行功能设计。

继承configrable，并在其中对interceptorchain进行配置，配置对应的interceptor。

针对external source对event数据进行处理，之后将数据put

在configure中，需要创建interceptor，interceptor的创建需要指定创建类型和创建的interceptor配置信息。为此创建builder方法的工厂方法，以及实例化的方法。

配置interceptor的信息是由用户给出，在interceptor interface创建builder interface 实现configurable接口并创建方法返回interceptor可以与之对应。

在interceptor的builder中采用工厂方法如下

InterceptorBuilderFactory Class

工厂方法的目的是创建interceptor的builder方法，其中需要指定builder的类型。

public static Builder newInstance(String name)
    throws ClassNotFoundException, InstantiationException,
    IllegalAccessException {

  Class<? extends Builder> clazz = lookup(name);
  if (clazz == null) {
    clazz = (Class<? extends Builder>) Class.forName(name);
  }
  return clazz.newInstance();
}

private static Class<? extends Builder> lookup(String name){
try{
  return InterceptionType.valueOf(name.toUpperCase(Local.ENGLISH)).getBuilderClass();
}
catch(IllegalArgumentException e){
return null;
}
}

InterceptionType Enum

管理buildertype的映射InterceptionType详细如下：

public enum InterceptorType {
TIMESTAMP(org.apache.flume.interceptor.TimestampInterceptor.Builder.class),
HOST(org.apache.flume.interceptor.HostInterceptor.Builder.class),
STATIC(org.apache.flume.interceptor.StaticInterceptor.Builder.class),
REGEX_FILTER(
    org.apache.flume.interceptor.RegexFilteringInterceptor.Builder.class),
REGEX_EXTRACTOR(org.apache.flume.interceptor.RegexExtractorInterceptor.Builder.class),
REMOVE_HEADER(org.apache.flume.interceptor.RemoveHeaderInterceptor.Builder.class),
SEARCH_REPLACE(org.apache.flume.interceptor.SearchAndReplaceInterceptor.Builder.class);

private Class<? extends Interceptor.Builder> builder;

public InterceptorType (Class <? extends Interceptor.Builder> builder){
this.builder = builder;
}

public Class <? extends Interceptor.Builder> getBuilderClass(){
return this.builder;
}
}

在完成interceptor builder 的工厂方法和configurable的实现外，可以实现channelProcessor的Congurable接口，在实现之前我们先介绍Context类的基本数据结构与暴露函数。

Context Class

Flume中的配置文件采用Context解析的方式进行。以下将介绍Context类的数据结构与暴露方法。

Context的config信息主要通过K,V键值对的形式进行存储

private Map<String, String> parameters;

Context的配置包括get、set方法，其中为保证线程安全采用SynchronizedMap保证线程安全。为保证get到的数据无法修改，使用IMutableMap返回，针对get方法，使用synchronized关键词保证线程安全。

set方法如下。

public Context() {
  parameters = Collections.synchronizedMap(new HashMap<String, String>());
}

public Context(Map<String, String> paramters) {
  this();
  this.putAll(paramters);
}

public void putAll(Map<String, String> map) {
  parameters.putAll(map);
}

public void put(String key, String value) {
  parameters.put(key, value);
}

public void clear() {
  parameters.clear();
}

get方法如下：

对所有parameter的get方法

public ImmutableMap<String, String> getParameters() {
  synchronized (parameters) {
    return ImmutableMap.copyOf(parameters);
  }
}

对内暴露单个key的get方法如下：

private String get(String key, String defaultValue) {
  String result = parameters.get(key);
  if (result != null) {
    return result;
  }
  return defaultValue;
}

private String get(String key) {
  return get(key, null);
}

Context对外暴露的get方法包括getBoolean、getInteger、getLong、getString、getFloat、getDouble。

public Boolean getBoolean(String key, Boolean defaultValue) {
  String value = get(key);
  if (value != null) {
    return Boolean.valueOf(Boolean.parseBoolean(value.trim()));
  }
  return defaultValue;
}

public Boolean getBoolean(String key) {
  return getBoolean(key, null);
}

public Integer getInteger(String key, Integer defaultValue) {
  String value = get(key);
  if (value != null) {
    return Integer.valueOf(Integer.parseInt(value.trim()));
  }
  return defaultValue;
}

public Integer getInteger(String key) {
  return getInteger(key, null);
}

public Long getLong(String key, Long defaultValue) {
  String value = get(key);
  if (value != null) {
    return Long.valueOf(Long.parseLong(value.trim()));
  }
  return defaultValue;
}

public Long getLong(String key) {
  return getLong(key, null);
}

public String getString(String key, String defaultValue) {
  return get(key, defaultValue);
}

public String getString(String key) {
  return get(key);
}

public Float getFloat(String key, Float defaultValue) {
  String value = get(key);
  if (value != null) {
    return Float.parseFloat(value.trim());
  }
  return defaultValue;
}

public Float getFloat(String key) {
  return getFloat(key, null);
}

public Double getDouble(String key, Double defaultValue) {
  String value = get(key);
  if (value != null) {
    return Double.parseDouble(value.trim());
  }
  return defaultValue;
}

public Double getDouble(String key) {
  return getDouble(key, null);
}

除了针对Context的get和set方法外，还包括了对于subContext的get方法

public ImmutableMap<String, String> getSubProperties(String prefix) {
  Preconditions.checkArgument(prefix.endsWith("."),
      "The given prefix does not end with a period (" + prefix + ")");
  Map<String, String> result = Maps.newHashMap();
  synchronized (parameters) {
    for (Entry<String, String> entry : parameters.entrySet()) {
      String key = entry.getKey();
      if (key.startsWith(prefix)) {
        String name = key.substring(prefix.length());
        result.put(name, entry.getValue());
      }
    }
  }
  return ImmutableMap.copyOf(result);
}

上述对于Context类的分析，我们完成了数据结构与get方法。接下来我们针对ChannelPorocessor的configure进行实现。

public void configure(Context context) {
  configureInterceptors(context);
}

private void configureInterceptors(Context context) {

  List<Interceptor> interceptors = Lists.newLinkedList();

  String interceptorListStr = context.getString("interceptors", "");
  if (interceptorListStr.isEmpty()) {
    return;
  }
  String[] interceptorNames = interceptorListStr.split("\\s+");

  Context interceptorContexts =
      new Context(context.getSubProperties("interceptors."));

  // run through and instantiate all the interceptors specified in the Context
  InterceptorBuilderFactory factory = new InterceptorBuilderFactory();
  for (String interceptorName : interceptorNames) {
    Context interceptorContext = new Context(
        interceptorContexts.getSubProperties(interceptorName + "."));
    String type = interceptorContext.getString("type");
    if (type == null) {
      LOG.error("Type not specified for interceptor " + interceptorName);
      throw new FlumeException("Interceptor.Type not specified for " +
          interceptorName);
    }
    try {
      Interceptor.Builder builder = factory.newInstance(type);
      builder.configure(interceptorContext);
      interceptors.add(builder.build());
    } catch (ClassNotFoundException e) {
      LOG.error("Builder class not found. Exception follows.", e);
      throw new FlumeException("Interceptor.Builder not found.", e);
    } catch (InstantiationException e) {
      LOG.error("Could not instantiate Builder. Exception follows.", e);
      throw new FlumeException("Interceptor.Builder not constructable.", e);
    } catch (IllegalAccessException e) {
      LOG.error("Unable to access Builder. Exception follows.", e);
      throw new FlumeException("Unable to access Interceptor.Builder.", e);
    }
  }

  interceptorChain.setInterceptors(interceptors);
}

channelProcessor函数首先获取所有的intercedptor的个数与属性，创建builderfactory，对某个interceptor名字，解析对应的type与subproperties，创建builder并指定对应的配置信息（接口暴露用户），通过builder方法完成创建，加入interceptorchain即可。

以上我们完成了ChannelProcessor的第一个功能，

下面我们针对第二个功能

针对external source对event数据进行处理，之后将数据put进行实现

首先需要明确的是channelprocessor完成的第二个工作（put工作）需要在线程中启动，这决定着source需要暴露channelprocessor的get方法且该方法需要在run方法中调用（之后对线程流程中进行解释）。接下来将对channelprocessor的方法进行解释。

processEvent Function Void

首先需要明确channelProcessor对event的处理首先通过interceptorchain进行过滤或处理，之后利用用户定义的channelselector确定channel，之后按照事务传送的格式对event进行传送。

event = interceptorChain.intercept(event);
if (event == null) {
  return;
}

List<Channel> requiredChannels = selector.getRequiredChannels(event);
for (Channel reqChannel : requiredChannels) {
  Transaction tx = reqChannel.getTransaction();
  Preconditions.checkNotNull(tx, "Transaction object must not be null");
  try {
    tx.begin();

    reqChannel.put(event);

    tx.commit();
  } catch (Throwable t) {
    tx.rollback();
    if (t instanceof Error) {
      LOG.error("Error while writing to required channel: " + reqChannel, t);
      throw (Error) t;
    } else if (t instanceof ChannelException) {
      throw (ChannelException) t;
    } else {
      throw new ChannelException("Unable to put event on required " +
          "channel: " + reqChannel, t);
    }
  } finally {
    if (tx != null) {
      tx.close();
    }
  }
}

List<Channel> optionalChannels = selector.getOptionalChannels(event);
for (Channel optChannel : optionalChannels) {
  Transaction tx = null;
  try {
    tx = optChannel.getTransaction();
    tx.begin();

    optChannel.put(event);

    tx.commit();
  } catch (Throwable t) {
    tx.rollback();
    LOG.error("Unable to put event on optional channel: " + optChannel, t);
    if (t instanceof Error) {
      throw (Error) t;
    }
  } finally {
    if (tx != null) {
      tx.close();
    }
  }
}

以上即是对于单个event的处理，首先将event通过interceptorchain得到最后的event，之后利用得到该event的requiredChannels，对每个requiredchannel按照事务的处理流程首先getstate验证，begin，将event put到channel，commit，如果中间存在exception，进入rollback。最后close，对此的解释将在后续进行详细解释。getoptional的处理流程相同。

除了对于单个event的处理外，channelprocessor提供了对batch大小的event处理函数如下。

processEventBatch Function Void

Preconditions.checkNotNull(events, "Event list must not be null");

events = interceptorChain.intercept(events);

Map<Channel, List<Event>> reqChannelQueue =
    new LinkedHashMap<Channel, List<Event>>();

Map<Channel, List<Event>> optChannelQueue =
    new LinkedHashMap<Channel, List<Event>>();

for (Event event : events) {
  List<Channel> reqChannels = selector.getRequiredChannels(event);

  for (Channel ch : reqChannels) {
    List<Event> eventQueue = reqChannelQueue.get(ch);
    if (eventQueue == null) {
      eventQueue = new ArrayList<Event>();
      reqChannelQueue.put(ch, eventQueue);
    }
    eventQueue.add(event);
  }

  List<Channel> optChannels = selector.getOptionalChannels(event);

  for (Channel ch : optChannels) {
    List<Event> eventQueue = optChannelQueue.get(ch);
    if (eventQueue == null) {
      eventQueue = new ArrayList<Event>();
      optChannelQueue.put(ch, eventQueue);
    }

    eventQueue.add(event);
  }
}

// Process required channels
for (Channel reqChannel : reqChannelQueue.keySet()) {
  Transaction tx = reqChannel.getTransaction();
  Preconditions.checkNotNull(tx, "Transaction object must not be null");
  try {
    tx.begin();

    List<Event> batch = reqChannelQueue.get(reqChannel);

    for (Event event : batch) {
      reqChannel.put(event);
    }

    tx.commit();
  } catch (Throwable t) {
    tx.rollback();
    if (t instanceof Error) {
      LOG.error("Error while writing to required channel: " + reqChannel, t);
      throw (Error) t;
    } else if (t instanceof ChannelException) {
      throw (ChannelException) t;
    } else {
      throw new ChannelException("Unable to put batch on required " +
          "channel: " + reqChannel, t);
    }
  } finally {
    if (tx != null) {
      tx.close();
    }
  }
}

// Process optional channels
for (Channel optChannel : optChannelQueue.keySet()) {
  Transaction tx = optChannel.getTransaction();
  Preconditions.checkNotNull(tx, "Transaction object must not be null");
  try {
    tx.begin();

    List<Event> batch = optChannelQueue.get(optChannel);

    for (Event event : batch) {
      optChannel.put(event);
    }

    tx.commit();
  } catch (Throwable t) {
    tx.rollback();
    LOG.error("Unable to put batch on optional channel: " + optChannel, t);
    if (t instanceof Error) {
      throw (Error) t;
    }
  } finally {
    if (tx != null) {
      tx.close();
    }
  }
}

处理逻辑与单个event基本保持一致，首先经过interceptorchain，创建channel与event的对应关系map，对每个event更新map的对应关系，最后针对每个channel进行事务处理，处理思路与单个event保持相同。

通过上述对于channelProcessor的处理逻辑，我们进一步了解到，

channelProcessor的一个任务是对interceptorchain配置。完成对于context的解析，通过创建builder的工厂对象，指定type和config最终完成interceptor的创建，并将其配置到interceptorchain中（其中config为自定义方法，通过interceptor interface创建Builder interface继承configurable interface实现）。

channelProcessor的第二任务对数据的channel的put。其实现方法通过在channelprocessor的channelselector interface实现。将channelselector接口暴露给用户，获得channel后，利用channel的transaction事务完成put。该方法需要在线程中启动，因此需要提供channelprocessor对外暴露接口。