Flume Error must not generate more than one output value per record field解决

今天在写一个Flume的程序,在经过一系列的配置之后发现抛出下面异常,造成Flume启动宕了


must not generate more than one output value per record field

在网上搜索了一下这个问题,只有网友粘的一些官网上的说明,但是并没有太理解是什么问题。
下面是官网的morphlineInterceptor的说明,大概意思就是morphlineIntercepotr目前,有一个限制,拦截器的形容词不能为每个输入事件生成多个输出记录。

This interceptor filters the events through a morphline configuration file that defines a chain of transformation commands that pipe records from one command to another. For example the morphline can ignore certain events or alter or insert certain event headers via regular expression based pattern matching, or it can auto-detect and set a MIME type via Apache Tika on events that are intercepted. For example, this kind of packet sniffing can be used for content based dynamic routing in a Flume topology. MorphlineInterceptor can also help to implement dynamic routing to multiple Apache Solr collections (e.g. for multi-tenancy).

Currently, there is a restriction in that the morphline of an interceptor must not generate more than one output record for each input event. This interceptor is not intended for heavy duty ETL processing - if you need this consider moving ETL processing from the Flume Source to a Flume Sink, e.g. to a MorphlineSolrSink.

先将Interceptor配置附于下面

morphlines : [
{
        id : 2map
        importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
        commands : [
        {
            readLine {
                charset : UTF-8
            }
        }
        {
                findReplace {
                        field : message
                        pattern : "\""
                        isRegex : true
                        replacement : ""
                        replaceFirst : false
                }
        }
        {
                split {
                        inputField : message
                        outputFields : [RB040002,RB060002,'','',RB060003]
                        separator : "@@separator@@"
                        isRegex : false
                        addEmptyStrings : true
                        trim : true
                }
        }
        {
                setValues {
                        dataset : "RWA_BASIC_Z002_1111"
                        htable : "bcpdata5_struct"
                        RZ002442 : "@{RB060003}"
                        namespace : "EXT003"
                        message : []
                        _attachment_body : []
                }
        }

        ]
}
]

因为不太明白官网的说明,就把源码下了下来跟了一下,发现morphline的splitcommand是把读到的Event分隔并且将outputFields封装到一个Map中,通过观察源码MorphlineInterceptor.java 175行toEvent()方法发现split去除的值为null:{null,null},看到这里发现是这里出的问题下面有个判断

 if (entry.getValue().size() > 1) {
          throw new FlumeException(getClass().getName()
              + " must not generate more than one output value per record field");
        }

原来是这里抛出的异常,那么就看看到底是为什么会造成这样的取值,
好吧去看看morphline的用法

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html

发现split方法如下解释

Property NameDefaultDescription
outputFieldsnullThe names of the fields to add output values to, i.e. a list of strings. Example: [firstName, lastName, “”, age]. An empty string in a list indicates omit this column in the output. One of outputField or outputFields must be present, but not both.

原来是因为如果想跳过某些字符串用的是双引号“” 而我用的是单引号’’
那么将上面的代码修改了一下

outputFields : [RB040002,RB060002,“”,“”,RB060003]

ok问题顺利解决,官网的意思是一个event输入不可以有多个相同的输出项。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值