文章目录
1.环境要求
1.1 NiFi是使用java编写的,所以需要JDK
1.2 maven中需要的项目依赖
<dependencies>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-api</artifactId>
<version>${nifi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-utils</artifactId>
<version>${nifi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-processor-utils</artifactId>
<version>${nifi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<version>${nifi.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.12</version>
<scope>test</scope>
</dependency>
</dependencies>
1.2.1 nifi-api
1.2.2 nifi-utils
1.2.3 提供Process抽象类接口的nifi-processor-utils
1.2.4 测试的nifi-mock以及junit
1.2.5 ?
好像还需要??plugin提供了一个将类打包成nifi组件的nar包打包方式(类似于war包),打包部分需要nifi-api依赖,其他组件在之后可以看到对应的作用。
1.3 idea下进行开发
(网上有些方法是使用命令行搭建项目骨架,我操作的时候发现存在一些error,所以还是在IDEA下操作吧,方便简单)
2.Developing
2.1 new处理器的文件
在/src/main/resources/META-INF/services/目录下new一个文件org.apache.nifi.processor.Processor,这个类似于配置文件,指向自定义的Processor所在的位置,如:
rocks.nifi.examples.processors.JsonProcessor
2.2 new一个自定义的processor
Define a simple java class as defined in the setup process 如:(rocks.nifi.examples.processors.JsonProcessor)
2.2.1 Apache Nifi Processor Header
//不需要关注上下文
@SideEffectFree
//processor的标签
@Tags({"JSON","SHA0W.PUB"})
//processor的备注
@CapabilityDescription("Fetch value from json path.")
//Finally most processors will just extend the AbstractProcessor, for more complicated tasks it may be required to go a level deeper for the AbstractSessionFactoryProcessor.
public class JsonProcessor extends AbstractProcessor{
}
2.2.2 Variable Declaration
为processor添加properties,Relationship.There is a large selection of validators in nifi-processor-utils package in the offical developer guide.
//properties用于存储这个processor中配置了的配置参数
private List<PropertyDescriptor> properties;
//relationship用于存储这个processor中配置的数据去向关系。
private Set<Relationship> relationships;
public static final String MATCH_ATTR = "match";
public static final PropertyDescriptor JSON_PATH = new PropertyDescriptor.Builder()
// 参数名,输入框前展示的内容
.name("Json Path")
// 是否必填
.required(true)
// 添加过滤器
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
//内容添加完成后构建
.build();
public static final Relationship SUCCESS = new Relationship.Builder()
.name("SUCCESS")
.description("Succes relationship")
.build();
//多个选项型的属性值定义如下
public static final AllowableValue EXTENSIVE = new AllowableValue("Extensive", "Extensive",
"Everything will be logged - use with caution!");
public static final PropertyDescriptor LOG_LEVEL = new PropertyDescriptor.Builder()
.name("Amount to Log")
.description("How much the Processor should log")
.allowableValues(REGULAR, VERBOSE, EXTENSIVE)
.defaultValue(REGULAR.getValue())
...
.build();
2.2.3 Apache Nifi Init
The init function is called at the start of Apache Nifi. Remember that this is a highly multi-threaded environment and be careful what you do in this space. This is why both the list of properties and the set of relationships are set with unmodifiable collections. I put the getters for the properties and relationships here as well.两个get方法主要用于页面正常显示
init主要用于加载processor中定义的Relationship和PropertyDescriptor
@Override
public void init(final ProcessorInitializationContext context){
List<PropertyDescriptor> properties = new ArrayList<>();
properties.add(JSON_PATH);
// 防止多线程ADD
this.properties = Collections.unmodifiableList(properties);
Set<Relationship> relationships = new HashSet<>();
relationships.add(SUCCESS);
this.relationships = Collections.unmodifiableSet(relationships);
}
//两个get方法主要用于页面正常显示
@Override
public Set<Relationship> getRelationships(){
return relationships;
}
@Override
public List<PropertyDescriptor> getSupportedPropertyDescriptors(){
return properties;
}
2.2.4 The onTrigger method
The onTrigger method is called when ever a flow file is passed to the processor. For more details on the context and session variables please again refer to the official developer guide.处理单位是fowfile,当每个数据流碎片来到时,具体要执行什么样的操作,需要根据这个方法来判断,负责实现业务逻辑的方法:
@Override
public void onTrigger(ProcessContext processContext, ProcessSession processSession) throws ProcessException {
final AtomicReference<String> value = new AtomicReference<>();
//我们首先需要根据session来获取到要处理的flowfile
FlowFile flowFile = processSession.get();
//read(FlowFile, InputStream),read方法用于读取flow中的内容
//write(FlowFile, OutputStream),write方法用于向flow中写数据
//write(flowfile,processorStream),同时处理输入和输出,所有的操作基本都放在了函数的回调方法中。数据处理完成后,需要根据处理结果的不同,将处理结果分发出去。所以第三种方法只适合业务逻辑以及代码较为简单的处理组件。
//对于业务逻辑比较复杂的processor,尽量选择使用先读取数据,之后处理数据,然后重新回写数据的形式,inputstreamcallback和oitputstreamcallback都需要用到,以减少针对flowfile读写的消耗
//read方法用于读取flow中的内容
processSession.read(flowFile, in -> {
try{
String json = IOUtils.toString(in);
String result = JsonPath.read(json, "$.hello");
value.set(result);
}catch(Exception ex){
ex.printStackTrace();
getLogger().error("Failed to read json string.");
}
});
// Write the results to an attribute,write方法用于向flow中写数据
String results = value.get();
if(results != null && !results.isEmpty()){
flowFile = processSession.putAttribute(flowFile, "match", results);
}
// To write the results back out ot flow file
flowFile = processSession.write(flowFile, out -> out.write(value.get().getBytes()));
//Finally every flow file that is generated needs to be deleted or transfered.
processSession.transfer(flowFile, SUCCESS);
}
In general you pull the flow file out of session. Read and write to the flow files and add attributes where needed. To work on flow files nifi provides 3 callback interfaces.
2.2.5 InputStreamCallback
For reading the contents of the flow file through a input stream.
session.read(flowfile, new InputStreamCallback() {
@Override
public void process(InputStream in) throws IOException {
try{
//Using Apache Commons to read the input stream out to a string.
String json = IOUtils.toString(in);
//Use JsonPath to attempt to read the json and set a value to the pass on.
String result = JsonPath.read(json, "$.hello");
value.set(result);
}catch(Exception ex){
// It would normally be best practice in the case of a exception to pass the original flow file to a Error relation point in the case of an exception.
ex.printStackTrace();
getLogger().error("Failed to read json string.");
}
}
});
2.2.6 OutputStreamCallback
For writing to a flowfile, this will over write not concatenate.We simply write out the value we recieved in the InputStreamCallback
flowfile = session.write(flowfile, new OutputStreamCallback() {
@Override
public void process(OutputStream out) throws IOException {
out.write(value.get().getBytes());
}
});
2.2.7 StreamCallback
This is for both reading and writing to the same flow file. With both the outputstreamcallback and streamcall back remember to assign it back to a flow file. This processor is not in use in the code and could have been. The choice was deliberate to show a way of moving data out of callbacks and back in.
flowfile = session.write(flowfile, new OutputStreamCallback() {
@Override
public void process(OutputStream out) throws IOException {
out.write(value.get().getBytes());
}
});
2.3 Test
应该先在项目里测试看是否符合设计规范
3.Deployment
3.1 打包
在文件路径下,进入命令行界面,并执行mvn clean install
命令
3.2 上传
找到[INFO] Installing D:\ideaSpace\nifi-1.3.0\self-define\first-processors\nifi-demo-nar\target\nifi-demo-nar-1.0.nar to D:\SoftWares\apache-maven-3.2.3\repo\first\nifi-demo-nar\1.0\nifi-demo-nar-1.0.nar中的nifi-demo-nar-1.0.nar
将nar后缀的文件上传至Nifi的服务器的lib目录下
3.3 重启Nifi进入UI
在UI界面下就可以使用了!
参考以下:
https://blog.csdn.net/mianshui1105/article/details/75313480
https://blog.csdn.net/larygry/article/details/89092573
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://blog.csdn.net/mianshui1105/article/details/75313480