背景
上篇文章写了『复盘-Flume+Kafka实时数据采集』文章反应挺好,小弟我这准备再接再厉,今日份来个实战干货,也是我JAVA处女秀,编写Flume自定义拦截器-实时数据转换,也是整个实时数据最重要的部分,如果这里处理不会回影响整体的实时响应。
分析
红色标注部分用Flume进行了数据流转换,然后通过Kafka流向Spark。最后入库。
如您只是拿Spark进行ETL操作,那下面方案会让你眼前一亮,如需要计算,那么您看到这里可以选择看看其他文章了哈
我的痛点
1. 业务方众多,需要提前建模(最好可以快速使业务方接入)
2. 流程长,需要经过重型计算引擎,实时性消耗严重
3. 我就想做ETL,不想学习Spark
4. 还有很多工具可选择。。。我也不想学了,怎么办,想想偷懒的办法哈
以上问题也是我最近一直在思考,如何才能简化我的操作,某天在看Flume官方文档发现它有自定义拦截器,同时提供支持自定义插件,也许这个能是偷懒我的切入点
先说一下我目前的需求:
1. 业务方日志不统一,针对不同业务方进行ETL(清洗、转换、提取)
2. 我使用Druid分析工具,虽然它支持CSV等切割,但最好提供JSON摄取,Text -> Json
3. 对每个业务方自定义分隔符,可快速响应
4. 插件式,针对不同业务方提供不同功能
Flume自定义拦截器部分我们可以通过扩展开发,来实现ETL(清洗,转换,提取)
插件案例及使用方式 - 实战
Text -> Json
原始日志
2019-02-11 19:03:30.123|INFO|1.0|10.10.10.10|push-service|trace_id:0001|msg:错误信息|token:fds1321sd
转换以后
{"times":"2019-02-11 19:03:29.123", "errLevel": "INFO", "version":"1.0" , "ip":"10.10.10.10", "service-name":"push-service", "trace_id": "trace_id:0001","msg": "msg:错误信息"}
插件使用方式
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=weibo.flume.textToJson.TextToJsonBuilder
a1.sources.r1.interceptors.i1.textToJson={"times":"#0", "errLevel": "#1", "version":"#2" , "ip":"#3", "service-name":"#4", "trace_id": "#5","msg": "#6"}
a1.sources.r1.interceptors.i1.separator=|
0,#1,#2,#3....此处为原始日志索引标识
- 2019-02-11 19:03:29.123
- INFO
- 1.0
- 10.10.10.10
插件内部会将textToJson结构进行映射,在数据流转换过程中轻量级转换,直接入库即可
开发插件 - 实战
由于Flume为JAVA开发,此处也是java代码
我这里废话一句哈。。。当我开发完插件时发现,代码没有几行,重要的是了解整个开发流程,插件书写方式
- 构建工程pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.apache.flume</groupId>
<artifactId>weibo.flume</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-sdk</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.9.0</version>
</dependency>
</dependencies>
<build>
<defaultGoal>compile</defaultGoal>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
- 目录结构
- Builder文件 TextToJsonBuilder.java
package weibo.flume.textToJson;
import org.apache.flume.Context;
import org.apache.flume.interceptor.Interceptor;
/*
* a1.sources.r1.interceptors.i1.type=weibo.flume.textToJson.TextToJsonBuilder
a1.sources.r1.interceptors.i1.textToJson={"times":"#0", "errLevel": "#1", "version":"#2" , "ip":"#3", "service-name":"#4", "trace_id": "#5","msg": "#6"}
a1.sources.r1.interceptors.i1.separator=|
* */
public class TextToJsonBuilder implements Interceptor.Builder {
private String textToJson = null;
private String separator = null;
public void configure(Context context) {
// set argument serviceId
String configServiceId = context.getString("textToJson");
String separatorLimit = context.getString("separator");
textToJson = configServiceId;
separator = separatorLimit;
}
public Interceptor build() {
return new textToJsonInterceptor(textToJson, separator);
}
}
- 拦截器实际文件 textToJsonInterceptor.java
package weibo.flume.textToJson;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import com.google.common.base.Charsets;
public class textToJsonInterceptor implements Interceptor {
private String textToJson = null;
private String separator = null;
// 自定义属性 serviceId
public textToJsonInterceptor(String textToJson, String separator) {
this.textToJson = textToJson;
this.separator = separator;
}
public void initialize() {
// TODO Auto-generated method stub
}
//拦截器的核心
public Event intercept(Event event) {
String body = new String(event.getBody(), Charsets.UTF_8);
Map<String, String> headerMap = event.getHeaders();
System.out.println("xiaoliang body7:"+body.toString());
System.out.println(headerMap);
// System.out.println(ipAddress.toString());
StringBuffer bodyoutput = new StringBuffer();
// 日志行
// String str = "2019-02-11 19:03:30.123|INFO|1.0|10.10.10.10|push-service|trace_id:0001|msg:错误信息|token:fds1321sd";
// String str = "2019-02-11 19:03:30.123|token:fds1321sd";
String separatorLimit = "|";
if(separator != "") {
separatorLimit = separator;
}
String[] arr = body.toString().split(separatorLimit);
System.out.println(arr);
// JSON结构体
// String jsonString = "{"times":"#0", "errLevel": "#1", "version":"#2" , "ip":"#3", "service-name":"#4", "trace_id": "#5","msg": "#6"}";
String jsonString = textToJson;
for(int i=0; i< arr.length; i++) {
jsonString = jsonString.replace("#"+i, arr[i]);
}
String eventBody = new String(event.getBody(), Charsets.UTF_8);
bodyoutput = bodyoutput.append(jsonString);
// bodyoutput = bodyoutput.append("No match");
event.setBody(bodyoutput.toString().trim().getBytes());
return event;
}
public List<Event> intercept(List<Event> events) {
for (Event event : events) {
intercept(event);
}
return events;
}
public void close() {
// TODO Auto-generated method stub
}
}
- 测试文件 test.java 模拟单行数据进行字符串替换
package com.zhb.flume;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
public class test {
public static void main(String[] args) {
// 日志行
String str = "2019-02-11 19:03:30.123|INFO|1.0|10.10.10.10|push-service|trace_id:0001|msg:错误信息|token:fds1321sd";
// String str = "2019-02-11 19:03:30.123|token:fds1321sd";
// String str = "123|234";
String[] arr = str.split("|");
// JSON结构体
String jsonString = "{"times":"#0", "errLevel": "#1", "version":"#2" , "ip":"#3", "service-name":"#4", "trace_id": "#5","msg": "#6"}";
for(int i=0; i< arr.length; i++) {
jsonString = jsonString.replace("#"+i, arr[i]);
System.out.println(arr[i]);
System.out.println(i);
}
System.out.println(jsonString);
/*
JSONObject jsonArr = JSONObject.parseObject(jsonString);
Map<String, String> mapJson = new HashMap<String,String>();
// 遍历JSON 将JSON结构体和每行日志按照索引对应上
Iterator iter = jsonArr.entrySet().iterator();
while(iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
String n = entry.getValue().toString();
int index = Integer.parseInt(n);
// 防止写入map下标越界
if(index < arr.length) {
System.out.println(index);
mapJson.put(entry.getKey().toString(), arr[index]);
}
}
String outputStr = JSONObject.toJSONString(mapJson);
System.out.println(outputStr);
*/
}
}
上面有一部分注释代码,这部分废弃原因是我在打jar包的时候需要把整体打包进去,怕会影响性能,所以采用了字符串替换掉方式,曲线救国哈(主要也是我刚开始接触java,jar的逻辑操作没整明白)
测试文件中间部分,直接负责到textToJsonInterceptor.java即可
切记:jar包一定要单独文件打包,最好是使用maven,这样不会把整个项目都构建到一个jar包里。
下面是构建流程
zhangliang@zhangliang:~/IdeaProjects/flumeappplugin$ mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< org.apache.flume:weibo.flume >--------------------
[INFO] Building weibo.flume 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ weibo.flume ---
[INFO] Deleting /Users/zhangliang/IdeaProjects/flumeappplugin/target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ weibo.flume ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 0 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ weibo.flume ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 2 source files to /Users/zhangliang/IdeaProjects/flumeappplugin/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ weibo.flume ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /Users/zhangliang/IdeaProjects/flumeappplugin/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ weibo.flume ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ weibo.flume ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ weibo.flume ---
[INFO] Building jar: /Users/zhangliang/IdeaProjects/flumeappplugin/target/weibo.flume-1.0.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.204 s
[INFO] Finished at: 2020-05-04T18:58:29+08:00
[INFO] ------------------------------------------------------------------------
zhangliang@zhangliang:~/IdeaProjects/flumeappplugin$ mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< org.apache.flume:weibo.flume >--------------------
[INFO] Building weibo.flume 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ weibo.flume ---
[INFO] Deleting /Users/zhangliang/IdeaProjects/flumeappplugin/target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ weibo.flume ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 0 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ weibo.flume ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 2 source files to /Users/zhangliang/IdeaProjects/flumeappplugin/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ weibo.flume ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /Users/zhangliang/IdeaProjects/flumeappplugin/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ weibo.flume ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ weibo.flume ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ weibo.flume ---
[INFO] Building jar: /Users/zhangliang/IdeaProjects/flumeappplugin/target/weibo.flume-1.0.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.991 s
[INFO] Finished at: 2020-05-04T19:03:54+08:00
[INFO] ------------------------------------------------------------------------
效果展示
Flume执行Log日志
Kafka接收数据后转换效果
如果有疑问或者需要我编写好的jar包可以联系我QQ:979314.