自定义的NiFi Processor的步骤

1.环境要求

1.1 NiFi是使用java编写的,所以需要JDK

1.2 maven中需要的项目依赖

<dependencies>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-api</artifactId>
        <version>${nifi.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-utils</artifactId>
        <version>${nifi.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-processor-utils</artifactId>
        <version>${nifi.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-mock</artifactId>
        <version>${nifi.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
        <scope>test</scope>
    </dependency>
      <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>1.7.12</version>
        <scope>test</scope>
     </dependency>
</dependencies>

1.2.1 nifi-api
1.2.2 nifi-utils
1.2.3 提供Process抽象类接口的nifi-processor-utils
1.2.4 测试的nifi-mock以及junit
1.2.5 ?

好像还需要??plugin提供了一个将类打包成nifi组件的nar包打包方式(类似于war包),打包部分需要nifi-api依赖,其他组件在之后可以看到对应的作用。

1.3 idea下进行开发

(网上有些方法是使用命令行搭建项目骨架,我操作的时候发现存在一些error,所以还是在IDEA下操作吧,方便简单)

2.Developing

2.1 new处理器的文件

在/src/main/resources/META-INF/services/目录下new一个文件org.apache.nifi.processor.Processor,这个类似于配置文件,指向自定义的Processor所在的位置,如:

rocks.nifi.examples.processors.JsonProcessor

2.2 new一个自定义的processor

Define a simple java class as defined in the setup process 如:(rocks.nifi.examples.processors.JsonProcessor)

2.2.1 Apache Nifi Processor Header

//不需要关注上下文
@SideEffectFree

//processor的标签
@Tags({"JSON","SHA0W.PUB"})

//processor的备注
@CapabilityDescription("Fetch value from json path.")

//Finally most processors will just extend the AbstractProcessor, for more complicated tasks it may be required to go a level deeper for the AbstractSessionFactoryProcessor.
public class JsonProcessor extends AbstractProcessor{
}

2.2.2 Variable Declaration

为processor添加properties,Relationship.There is a large selection of validators in nifi-processor-utils package in the offical developer guide.

//properties用于存储这个processor中配置了的配置参数
private List<PropertyDescriptor> properties;
//relationship用于存储这个processor中配置的数据去向关系。
private Set<Relationship> relationships;

public static final String MATCH_ATTR = "match";

public static final PropertyDescriptor JSON_PATH = new PropertyDescriptor.Builder()
        // 参数名,输入框前展示的内容
        .name("Json Path")
        // 是否必填
        .required(true)
        // 添加过滤器
        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
        //内容添加完成后构建
        .build();

public static final Relationship SUCCESS = new Relationship.Builder()
        .name("SUCCESS")
        .description("Succes relationship")
        .build();
        
//多个选项型的属性值定义如下
public static final AllowableValue EXTENSIVE = new AllowableValue("Extensive", "Extensive",
    "Everything will be logged - use with caution!");

public static final PropertyDescriptor LOG_LEVEL = new PropertyDescriptor.Builder()
  .name("Amount to Log")
  .description("How much the Processor should log")
  .allowableValues(REGULAR, VERBOSE, EXTENSIVE)
  .defaultValue(REGULAR.getValue())
  ...
  .build();

2.2.3 Apache Nifi Init

The init function is called at the start of Apache Nifi. Remember that this is a highly multi-threaded environment and be careful what you do in this space. This is why both the list of properties and the set of relationships are set with unmodifiable collections. I put the getters for the properties and relationships here as well.两个get方法主要用于页面正常显示
init主要用于加载processor中定义的Relationship和PropertyDescriptor

@Override
public void init(final ProcessorInitializationContext context){
    List<PropertyDescriptor> properties = new ArrayList<>();
    properties.add(JSON_PATH);
    // 防止多线程ADD
    this.properties = Collections.unmodifiableList(properties);
    Set<Relationship> relationships = new HashSet<>();
    relationships.add(SUCCESS);
    this.relationships = Collections.unmodifiableSet(relationships);
}

//两个get方法主要用于页面正常显示
@Override
public Set<Relationship> getRelationships(){
    return relationships;
}

@Override
public List<PropertyDescriptor> getSupportedPropertyDescriptors(){
    return properties;
}

2.2.4 The onTrigger method

The onTrigger method is called when ever a flow file is passed to the processor. For more details on the context and session variables please again refer to the official developer guide.处理单位是fowfile,当每个数据流碎片来到时,具体要执行什么样的操作,需要根据这个方法来判断,负责实现业务逻辑的方法:

 @Override
    public void onTrigger(ProcessContext processContext, ProcessSession processSession) throws ProcessException {
      
        final AtomicReference<String> value = new AtomicReference<>();

		//我们首先需要根据session来获取到要处理的flowfile
        FlowFile flowFile = processSession.get();

        //read(FlowFile, InputStream),read方法用于读取flow中的内容
        //write(FlowFile, OutputStream),write方法用于向flow中写数据
        //write(flowfile,processorStream),同时处理输入和输出,所有的操作基本都放在了函数的回调方法中。数据处理完成后,需要根据处理结果的不同,将处理结果分发出去。所以第三种方法只适合业务逻辑以及代码较为简单的处理组件。
        //对于业务逻辑比较复杂的processor,尽量选择使用先读取数据,之后处理数据,然后重新回写数据的形式,inputstreamcallback和oitputstreamcallback都需要用到,以减少针对flowfile读写的消耗
       
        //read方法用于读取flow中的内容
        processSession.read(flowFile, in -> {
            try{
                String json = IOUtils.toString(in);
                String result = JsonPath.read(json, "$.hello");
                value.set(result);
            }catch(Exception ex){
                ex.printStackTrace();
                getLogger().error("Failed to read json string.");
            }
        });
        
        // Write the results to an attribute,write方法用于向flow中写数据
        String results = value.get();
        if(results != null && !results.isEmpty()){
            flowFile = processSession.putAttribute(flowFile, "match", results);
        }

        // To write the results back out ot flow file
        flowFile = processSession.write(flowFile, out -> out.write(value.get().getBytes()));
        
        //Finally every flow file that is generated needs to be deleted or transfered.
        processSession.transfer(flowFile, SUCCESS);

    }

In general you pull the flow file out of session. Read and write to the flow files and add attributes where needed. To work on flow files nifi provides 3 callback interfaces.

2.2.5 InputStreamCallback

For reading the contents of the flow file through a input stream.

session.read(flowfile, new InputStreamCallback() {
    @Override
    public void process(InputStream in) throws IOException {
        try{
            //Using Apache Commons to read the input stream out to a string.
            String json = IOUtils.toString(in);
            //Use JsonPath to attempt to read the json and set a value to the pass on.
            String result = JsonPath.read(json, "$.hello");
            value.set(result);
        }catch(Exception ex){
            // It would normally be best practice in the case of a exception to pass the original flow file to a Error relation point in the case of an exception.
            ex.printStackTrace();
            getLogger().error("Failed to read json string.");
        }
    }
});  

2.2.6 OutputStreamCallback

For writing to a flowfile, this will over write not concatenate.We simply write out the value we recieved in the InputStreamCallback

flowfile = session.write(flowfile, new OutputStreamCallback() {
    @Override
    public void process(OutputStream out) throws IOException {
        out.write(value.get().getBytes());
    }
});

2.2.7 StreamCallback

This is for both reading and writing to the same flow file. With both the outputstreamcallback and streamcall back remember to assign it back to a flow file. This processor is not in use in the code and could have been. The choice was deliberate to show a way of moving data out of callbacks and back in.

flowfile = session.write(flowfile, new OutputStreamCallback() {
    @Override
     public void process(OutputStream out) throws IOException {
        out.write(value.get().getBytes());
    }
});

2.3 Test

应该先在项目里测试看是否符合设计规范

3.Deployment

3.1 打包

在文件路径下,进入命令行界面,并执行mvn clean install命令

3.2 上传

找到[INFO] Installing D:\ideaSpace\nifi-1.3.0\self-define\first-processors\nifi-demo-nar\target\nifi-demo-nar-1.0.nar to D:\SoftWares\apache-maven-3.2.3\repo\first\nifi-demo-nar\1.0\nifi-demo-nar-1.0.nar中的nifi-demo-nar-1.0.nar
将nar后缀的文件上传至Nifi的服务器的lib目录下

3.3 重启Nifi进入UI

在UI界面下就可以使用了!

参考以下:
https://blog.csdn.net/mianshui1105/article/details/75313480
https://blog.csdn.net/larygry/article/details/89092573
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://blog.csdn.net/mianshui1105/article/details/75313480

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值