nifi apache_深入研究自定义Apache Nifi处理器

nifi apache

Just a couple of years ago, software projects didn’t exceed a bunch of files! You could store a project on a Floppy disk and install it here and there. Nowadays, hardware and software compete with each other to run at a faster pace, yet collaborating with each other to help engineers build faster and faster and more complex systems. The age of creating a huge monolith project has gone and we observe the rise of Distributes systems, Microservices, Big data, and Cloud.

就在几年前,软件项目还没有超过一堆文件! 您可以将项目存储在软盘上,然后在此安装。 如今,硬件和软件相互竞争,以更快的速度运行,但是彼此协作,以帮助工程师构建越来越快,越来越复杂的系统。 创建一个巨大的整体项目的时代已经过去,我们观察到分布式系统,微服务,大数据和云的兴起。

As systems become more complicated, problems also grow with them. Day by day, Technologies, Projects, and Libraries emerge to solve a common problem and make the lives of software engineers easier. (Because their lives are really joyful, just sit behind a desk and stare at a monitor for hours! Can’t be easier than this?)

随着系统变得越来越复杂,问题也随之增加。 每天,技术,项目和库不断涌现,以解决一个常见问题并简化软件工程师的工作。 (因为他们的生活真的很快乐,所以只需坐在桌子后面,盯着显示器看几个小时!再简单不过了吗?)

While architecting software, you need to split it to a number of components, and choose one or more tools to play in each part (if you were lucky to find one, unless you should write something new). You need to benchmark different tools and choose the best one that fits your project. Besides selecting the best fit, you need to decide about the communication protocol and how to integrate them afterward.

在设计软件时,您需要将其拆分为多个组件,并选择一个或多个工具在每个部分中使用(如果很幸运找到一个工具,除非您应该编写新的东西)。 您需要对不同的工具进行基准测试,然后选择最适合您的项目的工具。 除了选择最合适的方式,您还需要确定通信协议以及之后如何集成它们。

These problems also apply to Data projects. You use Kafka as a message broker, Cassandra as a NoSql database, Redshift as a warehouse, Elasticsearch as a search engine, and… then you need a tool to manage the flows among these technologies and integrate them together. There are some projects that address such problems, and Apache Nifi is one of them.

这些问题也适用于数据项目。 您将Kafka用作消息代理,将Cassandra用作NoSql数据库,将Redshift用作仓库,将Elasticsearch用作搜索引擎,然后……您需要一个工具来管理这些技术之间的流程并将它们集成在一起。 有一些解决此类问题的项目,而Apache Nifi就是其中之一。

Wikipedia: Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems.

Wikipedia: Apache NiFi是Apache Software Foundation的软件项目,旨在自动化软件系统之间的数据流。

It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives include:

它支持数据路由,转换和系统中介逻辑的功能强大且可扩展的有向图。 一些高级功能和目标包括:

  • Web-based user interface

    基于Web的用户界面
  • Highly configurable

    高度可配置
  • Data Provenance

    资料来源
  • Designed for extension

    专为扩展而设计
  • Secure

    安全

It has around 300 processors for integrating to ecosystem out of the box! So in most cases, you just need to use and configure one of them correctly.

它具有约300个处理器 ,可直接与生态系统集成! 因此,在大多数情况下,您只需要正确使用和配置其中之一即可。

But sometimes you come to a specific situation where you need to create a custom processor for your project. For example, the data is encrypted or serialized and you need to apply some specific algorithms to decrypt or deserialize it.

但是有时您会遇到一种特殊情况,即需要为项目创建自定义处理器。 例如,数据已加密或序列化,您需要应用一些特定的算法对数据进行解密或反序列化。

In this article, we are going to see how to create a custom processor and write some lines in java to extract data received from Kafka and make an output JSON.

在本文中,我们将看到如何创建自定义处理器并在Java中编写一些行以提取从Kafka接收的数据并生成JSON。

You can find the source code here: https://github.com/m-semnani/nifi-customprocessor

您可以在这里找到源代码: https : //github.com/m-semnani/nifi-customprocessor

I suppose you are familiar to:

我想您熟悉:

and have all of them installed.

并安装所有这些。

Let’s start by creating a project. Nifi has a maven archetype which helps you easily create a project to start with:

让我们从创建一个项目开始。 Nifi具有Maven原型,可帮助您轻松创建以以下内容开头的项目:

  • For generating the archetype, First, we need to run:

    为了生成原型,首先,我们需要运行:
$> mvn archetype:generate
  • When prompting for a number or filter, just type “nifi

    提示输入数字或过滤器时,只需键入“ nifi

  • From the options proposed, choose “org.apache.nifi:nifi-processor-bundle-archetype

    从建议的选项中,选择“ org.apache.nifi:nifi-processor-bundle-archetype

  • Choose your desired Nifi compatible version.

    选择所需的Nifi兼容版本。
  • Enter desired maven properties, like artifact-id and group-id.

    输入所需的Maven属性,例如工件ID和组ID。
  • Oh, look! Now you have a maven project with a sample custom processor named “MyProcessor” inside.

    哦,看! 现在,您有了一个maven项目,其中包含一个名为“ MyProcessor ”的示例自定义处理器。

Lest’s go trough MyProcessor class. The first part of this class is devoted to the properties that you want to provide in GUI.

最好不要通过MyProcessor类。 此类的第一部分专门介绍要在GUI中提供的属性。

Here, for example, we define two properties, REGEX and PASS. There are a number of methods that can be used to define the characteristics of that property.

例如,在这里,我们定义两个属性,REGEX和PASS。 有许多方法可用于定义该属性的特征。

The second part is defining the relationship to other processors:

第二部分是定义与其他处理器的关系:

Here we defined two relationships, one for success routes and one for failures.

在这里,我们定义了两种关系,一种用于成功路线,一种用于失败。

Then you add your properties and relationships to the processor in the init method.

然后,您可以在init方法中将属性和关系添加到处理器。

And then onScheduled method which runs once when you start a processor.

然后onScheduled方法在启动处理器时运行一次。

In this method, we should define and initialize variables that are expensive to create and must be initialized only once. For example, we initialized ObjectMapper from Jackson library as a utility for working with JSON, and Java Pattern to compile the regex and will use them later in the code.

在这种方法中,我们应该定义和初始化创建成本很高的变量,并且必须仅对其进行一次初始化。 例如,我们将Jackson库中的ObjectMapper初始化为用于处理JSON的实用程序,并初始化了Java Pattern来编译正则表达式,并将在以后的代码中使用它们。

And the heart of this class would be onTrigger method. This one will be called for each input FlowFile and do the computation on it, then write the output FlowFile to the output stream.

这个类的核心将是onTrigger方法。 将为每个输入FlowFile调用此函数并对其进行计算,然后将输出FlowFile写入输出流。

Regarding the NIFI documents, ProcessContext Provides a bridge between a Processor and the NiFi Framework. And a ProcessSession encompasses all the behaviors a processor can perform to obtain, clone, read, modify remove FlowFiles in an atomic unit. A process session is always tied to a single processor at any one time and ensures no FlowFile can ever be accessed by any more than one processor at a given time. The session also ensures that all FlowFiles are always accounted for. The creator of a ProcessSession is always required to manage the session.

关于NIFI文档, ProcessContext在处理器和NiFi框架之间提供了桥梁。 而ProcessSession包含处理器可以执行的所有行为,以原子单位获取,克隆,读取,修改Remove FlowFiles。 进程会话始终在任何时候都绑定到单个处理器,并确保在给定的时间没有任何一个以上的处理器可以访问FlowFile。 会话还确保始终考虑所有FlowFiles。 始终需要ProcessSession的创建者来管理会话。

sesstion.write() method Executes the given StreamCallback against the content corresponding to the given flow file which provides an InputStream and an OutputStream.

sesstion.write ()方法对与给定流文件相对应的内容执行给定的StreamCallback ,该给定流文件提供InputStream和OutputStream。

In this method, we get the input stream and convert it into a byte array. If you are using Java 9 or higher, you can use the method of InputStream class, named .readAllBytes(). But for Java 8, You should make some more effort and write some code.

在此方法中,我们获取输入流并将其转换为字节数组。 如果使用的是Java 9或更高版本 ,则可以使用InputStream类的方法,该方法名为.readAllBytes() 。 但是对于Java 8,您应该付出更多的努力并编写一些代码。

Then in the extractFields method, we should deserialize input and do our magic and secret computation and return back a map of key values to create the output JSON. Confluent already provided Avro, Json, and Protobuf serialization and deserialization out of the box. So in this method, we need to first deserialize the byte array(ex: confluent AVRO), and then implement our specific decryption, and return the result.

然后,在extractFields方法中,我们应该对输入进行反序列化,并进行魔术和秘密计算,然后返回键值映射以创建输出JSON。 Confluent已经提供了开箱即用的Avro,Json和Protobuf序列化和反序列化功能。 因此,在此方法中,我们需要首先反序列化字节数组(例如:合流AVRO),然后实现我们的特定解密,然后返回结果。

Finally, we use ObjectMapper to convert our result into a JSON and write it to output stream.

最后,我们使用ObjectMapper将结果转换为JSON并将其写入输出流。

At the end of onTrigger method, if we faced any problem, then we route the FlowFile to the Failure relationship, and if everything was right, we send it to the successful relationship.

在onTrigger方法结束时,如果遇到任何问题,则将FlowFile路由到Failure关系,如果一切正确,则将其发送到成功关系。

Everything is ready, so let’s deploy our source code into production. To do so:

一切都准备就绪,因此让我们将源代码部署到生产中。 为此:

  • Go to the project’s root directory and build the project by running:

    转到项目的根目录,并通过运行以下命令来构建项目:
$> mvn clean install
  • Copy the generated nar file into Nifi lib directory:

    将生成的nar文件复制到Nifi lib目录中:
$> cp nifi-customprocessor-processors/target/nifi-customprocessor-processors-1.0-SNAPSHOT.nar NIFI_HOME/lib
  • And finally Start Nifi:

    最后启动Nifi:
$> sh NIFI_HOME/bin/nifi.sh start
  • Go to Nifi UI and Use Add Processor and search for yours ;)

    转到Nifi UI并使用“添加处理器”并搜索您的;)

Voila! Thanks to the Nifi extension policy and design, we have just created our custom Nifi processor and put it into production.

瞧! 多亏了Nifi扩展政策和设计,我们才创建了自定义Nifi处理器并将其投入生产。

https://github.com/apache/nifi/tree/main/nifi-nar-bundles is the best source to have a look at the out of the box processor’s source code and inspire from them.

https://github.com/apache/nifi/tree/main/nifi-nar-bundles是查看开箱即用处理器的源代码并从中获得启发的最佳来源。

The source code for this article is available at https://github.com/m-semnani/nifi-customprocessor to download.

本文的源代码可从https://github.com/m-semnani/nifi-customprocessor下载。

翻译自: https://itnext.io/deep-dive-into-a-custom-apache-nifi-processor-c2191e4f89a0

nifi apache

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值