flume通道选择器

原文链接:https://blog.csdn.net/xiao_jun_0820/article/details/38116103

前几篇文章只有一个项目的日志,现在我们考虑多个项目的日志的收集,我拷贝了一份flumedemo项目,重命名为flumedemo2,添加了一个WriteLog2.java类,稍微改动了一下JSON字符串的输出,将以前requestUrl中的"reporter-api"改为了"image-api",以便和WriteLog类的输出稍微区分开来,如下:


 
 
  1. package com.besttone.flume;
  2. import java.util.Date;
  3. import org.apache.commons.logging.Log;
  4. import org.apache.commons.logging.LogFactory;
  5. public class WriteLog2 {
  6. protected static final Log logger = LogFactory.getLog(WriteLog2.class);
  7. /**
  8. * @param args
  9. * @throws InterruptedException
  10. */
  11. public static void main(String[] args) throws InterruptedException {
  12. // TODO Auto-generated method stub
  13. while ( true) {
  14. logger.info( new Date().getTime());
  15. logger.info( "{“requestTime”:"
  16. + System.currentTimeMillis()
  17. + “,“requestParams”:{“timestamp”:1405499314238,“phone”:“02038824941”,“cardName”:“测试商家名称”,“provinceCode”:“440000”,“cityCode”:“440106”},“requestUrl”:”/image-api/reporter/reporter12/init.do"}");
  18. Thread.sleep( 2000);
  19. }
  20. }
  21. }

现在有这么一个需求描述:要求flumedemo的项目的log4j日志输出到hdfs,而flumedemo2项目的log4j日志输出到agent的log日志中。

我们还是采用log4jappender来配置log4j输出给flume的souce,现在的需求明显是有两个sink了,一个sink为hdfs,一个sink为logger。于是现在的拓扑结构应该是这样的:


需要实现这么一个拓扑接口,就需要使用到channel selectors,让不同的项目日志通过不同的channel到不同的sink中去。

官方文档上channel selectors 有两种类型:

Replicating Channel Selector (default)

Multiplexing Channel Selector

这两种selector的区别是:Replicating 会将source过来的events发往所有channel,而Multiplexing 可以选择该发往哪些channel。对于上面的例子来说,如果采用Replicating ,那么demo和demo2的日志会同时发往channel1和channel2,这显然是和需求不符的,需求只是让demo的日志发往channel1,而demo2的日志发往channel2。

综上所述,我们选择Multiplexing Channel Selector。这里我们有遇到一个棘手的问题,Multiplexing 需要判断header里指定key的值来决定分发到某个具体的channel,我们现在demo和demo2同时运行在同一个服务器上,如果在不同的服务器上运行,我们可以在 source1上加上一个 host 拦截器(上一篇有介绍过),这样可以通过header中的host来判断event该分发给哪个channel,而这里是在同一个服务器上,由host是区分不出来日志的来源的,我们必须想办法在header中添加一个key来区分日志的来源。

设想一下,如果header中有一个key:flume.client.log4j.logger.source,我们通过设置这个key的值,demo设为app1,demo2设为app2,这样我们就能通过设置:

tier1.sources.source1.channels=channel1 channel2
tier1.sources.source1.selector.type=multiplexing
tier1.sources.source1.selector.header=flume.client.log4j.logger.source
tier1.sources.source1.selector.mapping.app1=channel1
tier1.sources.source1.selector.mapping.app2=channel2

来将不同项目的的日志输出到不同的channel了。


我们按照这个思路继续下去,遇到了困难,log4jappender没有这样的参数来让你设置。怎么办?翻看了一下log4jappender的源码,发现可以很容易的实现扩展参数,于是我复制了一份log4jappender代码,新加了一个类叫Log4jExtAppender.java,里面扩展了一个参数叫:source,代码如下:


 
 
  1. package com.besttone.flume;
  2. import java.io.ByteArrayOutputStream;
  3. import java.io.IOException;
  4. import java.nio.charset.Charset;
  5. import java.util.HashMap;
  6. import java.util.Map;
  7. import java.util.Properties;
  8. import org.apache.avro.Schema;
  9. import org.apache.avro.generic.GenericRecord;
  10. import org.apache.avro.io.BinaryEncoder;
  11. import org.apache.avro.io.DatumWriter;
  12. import org.apache.avro.io.EncoderFactory;
  13. import org.apache.avro.reflect.ReflectData;
  14. import org.apache.avro.reflect.ReflectDatumWriter;
  15. import org.apache.avro.specific.SpecificRecord;
  16. import org.apache.flume.Event;
  17. import org.apache.flume.EventDeliveryException;
  18. import org.apache.flume.FlumeException;
  19. import org.apache.flume.api.RpcClient;
  20. import org.apache.flume.api.RpcClientConfigurationConstants;
  21. import org.apache.flume.api.RpcClientFactory;
  22. import org.apache.flume.clients.log4jappender.Log4jAvroHeaders;
  23. import org.apache.flume.event.EventBuilder;
  24. import org.apache.log4j.AppenderSkeleton;
  25. import org.apache.log4j.helpers.LogLog;
  26. import org.apache.log4j.spi.LoggingEvent;
  27. /**
  28. *
  29. * Appends Log4j Events to an external Flume client which is decribed by the
  30. * Log4j configuration file. The appender takes two required parameters:
  31. * <p>
  32. * <strong>Hostname</strong> : This is the hostname of the first hop at which
  33. * Flume (through an AvroSource) is listening for events.
  34. * </p>
  35. * <p>
  36. * <strong>Port</strong> : This the port on the above host where the Flume
  37. * Source is listening for events.
  38. * </p>
  39. * A sample log4j properties file which appends to a source would look like:
  40. *
  41. * <pre>
  42. * <p>
  43. * log4j.appender.out2 = org.apache.flume.clients.log4jappender.Log4jAppender
  44. * log4j.appender.out2.Port = 25430
  45. * log4j.appender.out2.Hostname = foobarflumesource.com
  46. * log4j.logger.org.apache.flume.clients.log4jappender = DEBUG,out2</p>
  47. * </pre>
  48. * <p>
  49. * <i>Note: Change the last line to the package of the class(es), that will do
  50. * the appending.For example if classes from the package com.bar.foo are
  51. * appending, the last line would be:</i>
  52. * </p>
  53. *
  54. * <pre>
  55. * <p>log4j.logger.com.bar.foo = DEBUG,out2</p>
  56. * </pre>
  57. *
  58. *
  59. */
  60. public class Log4jExtAppender extends AppenderSkeleton {
  61. private String hostname;
  62. private int port;
  63. private String source;
  64. public String getSource() {
  65. return source;
  66. }
  67. public void setSource(String source) {
  68. this.source = source;
  69. }
  70. private boolean unsafeMode = false;
  71. private long timeout = RpcClientConfigurationConstants.DEFAULT_REQUEST_TIMEOUT_MILLIS;
  72. private boolean avroReflectionEnabled;
  73. private String avroSchemaUrl;
  74. RpcClient rpcClient = null;
  75. /**
  76. * If this constructor is used programmatically rather than from a log4j
  77. * conf you must set the <tt>port</tt> and <tt>hostname</tt> and then call
  78. * <tt>activateOptions()</tt> before calling <tt>append()</tt>.
  79. */
  80. public Log4jExtAppender() {
  81. }
  82. /**
  83. * Sets the hostname and port. Even if these are passed the
  84. * <tt>activateOptions()</tt> function must be called before calling
  85. * <tt>append()</tt>, else <tt>append()</tt> will throw an Exception.
  86. *
  87. * @param hostname
  88. * The first hop where the client should connect to.
  89. * @param port
  90. * The port to connect on the host.
  91. *
  92. */
  93. public Log4jExtAppender(String hostname, int port, String source) {
  94. this.hostname = hostname;
  95. this.port = port;
  96. this.source = source;
  97. }
  98. /**
  99. * Append the LoggingEvent, to send to the first Flume hop.
  100. *
  101. * @param event
  102. * The LoggingEvent to be appended to the flume.
  103. * @throws FlumeException
  104. * if the appender was closed, or the hostname and port were not
  105. * setup, there was a timeout, or there was a connection error.
  106. */
  107. @Override
  108. public synchronized void append(LoggingEvent event) throws FlumeException {
  109. // If rpcClient is null, it means either this appender object was never
  110. // setup by setting hostname and port and then calling activateOptions
  111. // or this appender object was closed by calling close(), so we throw an
  112. // exception to show the appender is no longer accessible.
  113. if (rpcClient == null) {
  114. String errorMsg = "Cannot Append to Appender! Appender either closed or"
  115. + " not setup correctly!";
  116. LogLog.error(errorMsg);
  117. if (unsafeMode) {
  118. return;
  119. }
  120. throw new FlumeException(errorMsg);
  121. }
  122. if (!rpcClient.isActive()) {
  123. reconnect();
  124. }
  125. // Client created first time append is called.
  126. Map<String, String> hdrs = new HashMap<String, String>();
  127. hdrs.put(Log4jAvroHeaders.LOGGER_NAME.toString(), event.getLoggerName());
  128. hdrs.put(Log4jAvroHeaders.TIMESTAMP.toString(),
  129. String.valueOf(event.timeStamp));
  130. // 添加日志来源
  131. if ( this.source == null || this.source.equals( "")) {
  132. this.source = "unknown";
  133. }
  134. hdrs.put( "flume.client.log4j.logger.source", this.source);
  135. // To get the level back simply use
  136. // LoggerEvent.toLevel(hdrs.get(Integer.parseInt(
  137. // Log4jAvroHeaders.LOG_LEVEL.toString()))
  138. hdrs.put(Log4jAvroHeaders.LOG_LEVEL.toString(),
  139. String.valueOf(event.getLevel().toInt()));
  140. Event flumeEvent;
  141. Object message = event.getMessage();
  142. if (message instanceof GenericRecord) {
  143. GenericRecord record = (GenericRecord) message;
  144. populateAvroHeaders(hdrs, record.getSchema(), message);
  145. flumeEvent = EventBuilder.withBody(
  146. serialize(record, record.getSchema()), hdrs);
  147. } else if (message instanceof SpecificRecord || avroReflectionEnabled) {
  148. Schema schema = ReflectData.get().getSchema(message.getClass());
  149. populateAvroHeaders(hdrs, schema, message);
  150. flumeEvent = EventBuilder
  151. .withBody(serialize(message, schema), hdrs);
  152. } else {
  153. hdrs.put(Log4jAvroHeaders.MESSAGE_ENCODING.toString(), "UTF8");
  154. String msg = layout != null ? layout.format(event) : message
  155. .toString();
  156. flumeEvent = EventBuilder.withBody(msg, Charset.forName( "UTF8"),
  157. hdrs);
  158. }
  159. try {
  160. rpcClient.append(flumeEvent);
  161. } catch (EventDeliveryException e) {
  162. String msg = "Flume append() failed.";
  163. LogLog.error(msg);
  164. if (unsafeMode) {
  165. return;
  166. }
  167. throw new FlumeException(msg + " Exception follows.", e);
  168. }
  169. }
  170. private Schema schema;
  171. private ByteArrayOutputStream out;
  172. private DatumWriter<Object> writer;
  173. private BinaryEncoder encoder;
  174. protected void populateAvroHeaders(Map<String, String> hdrs, Schema schema,
  175. Object message) {
  176. if (avroSchemaUrl != null) {
  177. hdrs.put(Log4jAvroHeaders.AVRO_SCHEMA_URL.toString(), avroSchemaUrl);
  178. return;
  179. }
  180. LogLog.warn( "Cannot find ID for schema. Adding header for schema, "
  181. + "which may be inefficient. Consider setting up an Avro Schema Cache.");
  182. hdrs.put(Log4jAvroHeaders.AVRO_SCHEMA_LITERAL.toString(),
  183. schema.toString());
  184. }
  185. private byte[] serialize(Object datum, Schema datumSchema)
  186. throws FlumeException {
  187. if (schema == null || !datumSchema.equals(schema)) {
  188. schema = datumSchema;
  189. out = new ByteArrayOutputStream();
  190. writer = new ReflectDatumWriter<Object>(schema);
  191. encoder = EncoderFactory.get().binaryEncoder(out, null);
  192. }
  193. out.reset();
  194. try {
  195. writer.write(datum, encoder);
  196. encoder.flush();
  197. return out.toByteArray();
  198. } catch (IOException e) {
  199. throw new FlumeException(e);
  200. }
  201. }
  202. // This function should be synchronized to make sure one thread
  203. // does not close an appender another thread is using, and hence risking
  204. // a null pointer exception.
  205. /**
  206. * Closes underlying client. If <tt>append()</tt> is called after this
  207. * function is called, it will throw an exception.
  208. *
  209. * @throws FlumeException
  210. * if errors occur during close
  211. */
  212. @Override
  213. public synchronized void close() throws FlumeException {
  214. // Any append calls after this will result in an Exception.
  215. if (rpcClient != null) {
  216. try {
  217. rpcClient.close();
  218. } catch (FlumeException ex) {
  219. LogLog.error( "Error while trying to close RpcClient.", ex);
  220. if (unsafeMode) {
  221. return;
  222. }
  223. throw ex;
  224. } finally {
  225. rpcClient = null;
  226. }
  227. } else {
  228. String errorMsg = "Flume log4jappender already closed!";
  229. LogLog.error(errorMsg);
  230. if (unsafeMode) {
  231. return;
  232. }
  233. throw new FlumeException(errorMsg);
  234. }
  235. }
  236. @Override
  237. public boolean requiresLayout() {
  238. // This method is named quite incorrectly in the interface. It should
  239. // probably be called canUseLayout or something. According to the docs,
  240. // even if the appender can work without a layout, if it can work with
  241. // one,
  242. // this method must return true.
  243. return true;
  244. }
  245. /**
  246. * Set the first flume hop hostname.
  247. *
  248. * @param hostname
  249. * The first hop where the client should connect to.
  250. */
  251. public void setHostname(String hostname) {
  252. this.hostname = hostname;
  253. }
  254. /**
  255. * Set the port on the hostname to connect to.
  256. *
  257. * @param port
  258. * The port to connect on the host.
  259. */
  260. public void setPort(int port) {
  261. this.port = port;
  262. }
  263. public void setUnsafeMode(boolean unsafeMode) {
  264. this.unsafeMode = unsafeMode;
  265. }
  266. public boolean getUnsafeMode() {
  267. return unsafeMode;
  268. }
  269. public void setTimeout(long timeout) {
  270. this.timeout = timeout;
  271. }
  272. public long getTimeout() {
  273. return this.timeout;
  274. }
  275. public void setAvroReflectionEnabled(boolean avroReflectionEnabled) {
  276. this.avroReflectionEnabled = avroReflectionEnabled;
  277. }
  278. public void setAvroSchemaUrl(String avroSchemaUrl) {
  279. this.avroSchemaUrl = avroSchemaUrl;
  280. }
  281. /**
  282. * Activate the options set using <tt>setPort()</tt> and
  283. * <tt>setHostname()</tt>
  284. *
  285. * @throws FlumeException
  286. * if the <tt>hostname</tt> and <tt>port</tt> combination is
  287. * invalid.
  288. */
  289. @Override
  290. public void activateOptions() throws FlumeException {
  291. Properties props = new Properties();
  292. props.setProperty(RpcClientConfigurationConstants.CONFIG_HOSTS, "h1");
  293. props.setProperty(RpcClientConfigurationConstants.CONFIG_HOSTS_PREFIX
  294. + "h1", hostname + ":" + port);
  295. props.setProperty(
  296. RpcClientConfigurationConstants.CONFIG_CONNECT_TIMEOUT,
  297. String.valueOf(timeout));
  298. props.setProperty(
  299. RpcClientConfigurationConstants.CONFIG_REQUEST_TIMEOUT,
  300. String.valueOf(timeout));
  301. try {
  302. rpcClient = RpcClientFactory.getInstance(props);
  303. if (layout != null) {
  304. layout.activateOptions();
  305. }
  306. } catch (FlumeException e) {
  307. String errormsg = "RPC client creation failed! " + e.getMessage();
  308. LogLog.error(errormsg);
  309. if (unsafeMode) {
  310. return;
  311. }
  312. throw e;
  313. }
  314. }
  315. /**
  316. * Make it easy to reconnect on failure
  317. *
  318. * @throws FlumeException
  319. */
  320. private void reconnect() throws FlumeException {
  321. close();
  322. activateOptions();
  323. }
  324. }

然后然后将这个类打了一个jar包:Log4jExtAppender.jar,扔在了flumedemo和flumedemo2的lib目录下。

这时候flumedemo的log4j.properties如下:

log4j.rootLogger=INFO

log4j.category.com.besttone=INFO,flume,console,LogFile

#log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jExtAppender
log4j.appender.flume = com.besttone.flume.Log4jExtAppender
log4j.appender.flume.Hostname = localhost
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = false
log4j.appender.flume.Source = app1

log4j.appender.console= org.apache.log4j.ConsoleAppender
log4j.appender.console.Target= System.out
log4j.appender.console.layout= org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern= %d{yyyy-MM-dd HH:mm:ss} %5p %c{1}: %L - %m%n

log4j.appender.LogFile= org.apache.log4j.DailyRollingFileAppender
log4j.appender.LogFile.File= logs/app.log
log4j.appender.LogFile.MaxFileSize=10KB
log4j.appender.LogFile.Append= true
log4j.appender.LogFile.Threshold= DEBUG
log4j.appender.LogFile.layout= org.apache.log4j.PatternLayout
log4j.appender.LogFile.layout.ConversionPattern= %-d{yyyy-MM-dd HH:mm:ss} [%t:%r] - [%5p] %m%n


flumedemo2的如下:

log4j.rootLogger=INFO

log4j.category.com.besttone=INFO,flume,console,LogFile

#log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jExtAppender
log4j.appender.flume = com.besttone.flume.Log4jExtAppender
log4j.appender.flume.Hostname = localhost
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = false
log4j.appender.flume.Source = app2

log4j.appender.console= org.apache.log4j.ConsoleAppender
log4j.appender.console.Target= System.out
log4j.appender.console.layout= org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern= %d{yyyy-MM-dd HH:mm:ss} %5p %c{1}: %L - %m%n

log4j.appender.LogFile= org.apache.log4j.DailyRollingFileAppender
log4j.appender.LogFile.File= logs/app.log
log4j.appender.LogFile.MaxFileSize=10KB
log4j.appender.LogFile.Append= true
log4j.appender.LogFile.Threshold= DEBUG
log4j.appender.LogFile.layout= org.apache.log4j.PatternLayout
log4j.appender.LogFile.layout.ConversionPattern= %-d{yyyy-MM-dd HH:mm:ss} [%t:%r] - [%5p] %m%n


将原来的log4j.appender.flume 由org.apache.flume.clients.log4jappender.Log4jExtAppender改为了我重新实现添加了source参数的com.besttone.flume.Log4jExtAppender


然后flumedemo的log4j.appender.flume.Source = app1,flumedemo2的log4j.appender.flume.Source = app2。

运行flumedemo的WriteLog类,和和flumedemo2的WriteLog2类,分别去hdfs上和agent的log文件中看看内容,发现hdfs上都是app1的日志,log文件中都是app2的日志,功能实现。

完整的flume.conf如下:

tier1.sources=source1
tier1.channels=channel1 channel2
tier1.sinks=sink1 sink2
tier1.sources.source1.type=avro
tier1.sources.source1.bind=0.0.0.0
tier1.sources.source1.port=44444
tier1.sources.source1.channels=channel1 channel2
tier1.sources.source1.selector.type=multiplexing
tier1.sources.source1.selector.header=flume.client.log4j.logger.source
tier1.sources.source1.selector.mapping.app1=channel1
tier1.sources.source1.selector.mapping.app2=channel2
tier1.sources.source1.interceptors=i1 i2
tier1.sources.source1.interceptors.i1.type=regex_filter
tier1.sources.source1.interceptors.i1.regex=\\{.*\\}
tier1.sources.source1.interceptors.i2.type=timestamp
tier1.channels.channel1.type=memory
tier1.channels.channel1.capacity=10000
tier1.channels.channel1.transactionCapacity=1000
tier1.channels.channel1.keep-alive=30
tier1.channels.channel2.type=memory
tier1.channels.channel2.capacity=10000
tier1.channels.channel2.transactionCapacity=1000
tier1.channels.channel2.keep-alive=30
tier1.sinks.sink1.type=hdfs
tier1.sinks.sink1.channel=channel1
tier1.sinks.sink1.hdfs.path=hdfs://master68:8020/flume/events/%y-%m-%d
tier1.sinks.sink1.hdfs.round=true
tier1.sinks.sink1.hdfs.roundValue=10
tier1.sinks.sink1.hdfs.roundUnit=minute
tier1.sinks.sink1.hdfs.fileType=DataStream
tier1.sinks.sink1.hdfs.writeFormat=Text
tier1.sinks.sink1.hdfs.rollInterval=0
tier1.sinks.sink1.hdfs.rollSize=10240
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.idleTimeout=60
tier1.sinks.sink2.type=logger
tier1.sinks.sink2.channel=channel2



  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值