在使用flume收集数据,转换为json格式时,常常遇到特殊符号的问题,而json对于”引号,是非常敏感的,大家处理json数据的时候,要特别注意,在前不久,向es插入数据时,报错就是json转换失败
git地址:https://github.com/xvshu/flume-files-source
原因:
json通用格式:
{"key":"value"}
{"key":{}}
{"key":[]}
["one","two"]
[{}]
等形式,而 { } [ ] " : 这几个符号都是json组成格式
在{"key":"value"} 中,如果出现{"key":"val"u"e"},就会出现解析出错
解决办法:
将所有value字段单独处理,使用字符替换将" 替换为 ' ,就没有问题了
总结:
有时候,我们的细心会帮助我们抽丝剥茧,但是比较耗神,而在互联网时代,已经有各种工具帮助我们验证各种规则问题,一下给大家推荐一个验证json格式的网站,帮助大家解决json格式不一致的问题:
http://json.cn/
关键代码:
第一步:解决RandomAccessFile读取数据后,格式变化为“8859_1”需转换为原编码格式
第二步:替换 " 为 ' 解决jsonvalues的问题
if(line!=null){
line = new String(line.getBytes(ExecTailSourceConfigurationConstants.CHARSET_RANDOMACCESSFILE),charset);
line = line.replaceAll("\"","\'");
}
flume-source源码:
/*
* 作者:许恕
* 时间:2016年5月3日
* 功能:实现tail 某目录下的所有符合正则条件的文件
* Email:xvshu1@163.com
* To detect all files in a folder
*/
package org.apache.flume.source;
import com.google.common.base.Preconditions;
import com.google.common.util.concurrent.ThreadFactoryBuilder;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDrivenSource;
import org.apache.flume.SystemClock;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.instrumentation.SourceCounter;
import org.apache.flume.source.utils.MsgBuildeJson;
import org.mortbay.util.ajax.JSON;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.*;
import java.nio.charset.Charset;
import java.util.*;
import java.util.concurrent.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* step:
* 1,config one path
* 2,find all file with RegExp
* 3,tail one children file
* 4,batch to channal
*
* demo:
* demo.sources.s1.type = org.apache.flume.source.ExecTailSource
* demo.sources.s1.filepath=/export/home/tomcat/logs/auth.el.net/
* demo.sources.s1.filenameRegExp=(.log{1})$
* demo.sources.s1.tailing=true
* demo.sources.s1.readinterval=300
* demo.sources.s1.startAtBeginning=false
* demo.sources.s1.restart=true
*/
public class ExecTailSource extends AbstractSource implements EventDrivenSource,
Configurable {
private static final Logger logger = LoggerFactory
.getLogger(ExecTailSource.class);
private SourceCounter sourceCounter;
private ExecutorService executor;
private List<ExecRunnable> listRuners;
private List<Future<?>> listFuture;
private long restartThrottle;
private boolean restart = true;
private boolean logStderr;
private Integer bufferCount;
private long batchTimeout;
private Charset charset;
private String filepath;
private String filenameRegExp;
private boolean tailing;
private Integer readinterval;
private boolean startAtBeginning;
private boolean contextIsJson;
private String fileWriteJson;
private Long flushTime;
private boolean contextIsFlumeLog;
private String domain;
private String msgTypeConfig;
@Override
public void start() {
logger.info("=start=> flume tail source start begin time:"+new Date().toString());
logger.info("ExecTail source starting with filepath:{}", filepath);
List<String> listFiles = getFileList(filepath);
if(listFiles==null || listFiles.isEmpty()){
Preconditions.checkState(listFiles != null && !listFiles.isEmpty(),
"The filepath's file not have fiels with filenameRegExp");
}
Properties prop=null;
try{
prop = new Properties();//属性集合对象
FileInputStream fis = new FileInputStream(fileWriteJson);//属性文件流
prop.load(fis);
}catch(Exception ex){
logger.error("==>",ex);
}