elasticsearch filter插件开发初探

本文以7.10版本elasticsearch源码为示例(jdk需要14及以上)

插件分类

elasticsearch把插件抽象成多个分类,不同分类的插件作用不同,具体分类可以看 org.elasticsearch.plugins.Plugin 类:

/**
 * <ul>
 * <li>{@link ActionPlugin}
 * <li>{@link AnalysisPlugin}
 * <li>{@link ClusterPlugin}
 * <li>{@link DiscoveryPlugin}
 * <li>{@link IngestPlugin}
 * <li>{@link MapperPlugin}
 * <li>{@link NetworkPlugin}
 * <li>{@link RepositoryPlugin}
 * <li>{@link ScriptPlugin}
 * <li>{@link SearchPlugin}
 * <li>{@link ReloadablePlugin}
 * </ul>
 */
public abstract class Plugin implements Closeable {
    // 省略。。。
}

插件类型介绍【官网】:https://www.elastic.co/guide/en/elasticsearch/plugins/master/intro.html

在这里插入图片描述

以下摘自网上:

  • ActionPlugin:Rest api接口请求插件。开发者可以开发自身需要的rest命令,也可以对rest请求进行增加处理。如果Elasticsearch内置的命令如_all,cat,/cat/health等rest命令无法满足需求,开发者可以自己开发需要的rest命令。
  • AnalysisPlugin:分析插件,扩展索引分析功能,用于增强ES自身分析功能的不足,例如大家熟知的IK分词插件。
  • IngestPlugin:预处理插件。在数据索引之前进行预处理,例如根据ip来增加地理信息的geoip processor plugin。
  • MapperPlugin:映射插件。增强ES的数据类型。
  • NetworkPlugin:网络传输插件。
  • ScriptPlugin:脚本插件。主要用于扩展ES的脚本功能,比如自定义方法打分,让ES支持其他脚本语言。
  • SearchPlugin:查询插件。扩展ES的查询功能。
写一个filter插件

如果我们需要对文本做处理,那么我们写的插件应该定义成AnalysisPlugin类型;我们知道elasticsearch提供了很多内置的插件,可以看看这个类

org.elasticsearch.analysis.common.CommonAnalysisPlugin

在这里插入图片描述
这个类注册了很多常用的分析器、分词器、过滤器、分词过滤器,自定义插件可以学习里面的写法。

接下来我们来写一个对字符加密的插件,在参看了icu插件的部分源码(icu_normalizer)后,按照下面步骤:

新增一个 EncPlugin 类
package com.xx.plugin.es.enc;

import com.xx.plugin.es.enc.character.EncCharFilterFactory;
import com.xx.plugin.es.enc.token.EncCharTokenFilterFactory;
import org.elasticsearch.index.analysis.CharFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.MapperPlugin;
import org.elasticsearch.plugins.Plugin;

import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;

public class EncPlugin extends Plugin implements AnalysisPlugin, MapperPlugin {

    /**
     * CharFilter
     *
     * @return
     */
    @Override
    public Map<String, AnalysisModule.AnalysisProvider<CharFilterFactory>> getCharFilters() {
        Map<String, AnalysisModule.AnalysisProvider<CharFilterFactory>> filters = new TreeMap<>();
        filters.put("enc_filter", EncCharFilterFactory::new);
        return filters;
    }
}

这个类可以通过实现 AnalysisPlugin,重写方法向es返回CharFilters、TokenFilters、Tokenizers、Analyzers等。关于这几个概念的关系可以查看 :

EncCharTokenFilterFactory
package com.xx.plugin.es.enc.character;

import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.AbstractCharFilterFactory;
import org.elasticsearch.index.analysis.NormalizingCharFilterFactory;

import java.io.Reader;

/**
 * 参考:
 * org.elasticsearch.analysis.common.PatternReplaceCharFilterFactory
 * org.elasticsearch.analysis.common.MappingCharFilterFactory
 */
public class EncCharFilterFactory extends AbstractCharFilterFactory implements NormalizingCharFilterFactory {

    private EncNormalizer normalizer = null;


    public EncCharFilterFactory(IndexSettings indexSettings, String name) {
        super(indexSettings, name);
        normalizer = new EncNormalizerImpl();
    }

    public EncCharFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
        super(indexSettings, name);
        normalizer = new EncNormalizerImpl();
    }

    @Override
    public Reader create(Reader reader) {
        return new EncCharFilter(reader, normalizer);
    }
}

EncTokenNormalizer

一个空类,逻辑后续再实现。

package com.xx.plugin.es.enc.character;

public class EncNormalizerImpl extends EncNormalizer{
}

EncCharFilter
package com.xx.plugin.es.enc.character;

import org.apache.lucene.analysis.charfilter.BaseCharFilter;
import org.apache.lucene.analysis.pattern.PatternReplaceCharFilter;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.util.Objects;
import java.util.stream.Stream;

/**
 * 参考:
 *
 * @see PatternReplaceCharFilter
 */
public class EncCharFilter extends BaseCharFilter {

    private Reader transformedInput;

    public EncCharFilter(Reader in, EncNormalizer encNormalizer) {
        super(in);
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {

// 逻辑具体实现,源码看起来有点绕,还没有看明白逻辑;待后续实现 这里只暂时简单打印

        System.out.println("");
        System.out.println("--------------------------------------------------");
        // Buffer all input on the first call.
        System.out.printf("transformedInput == null:%s%n", transformedInput == null);
        if (transformedInput == null) {
            fill();
        }

        String str = new String(cbuf);
        System.out.printf("cbuf>>>:%s length:%s off>>>:%s len>>>:%s", str, str.length(), off, len);
        int read = transformedInput.read(cbuf, off, len);
        System.out.println(" read>>>" + read);
        return read;
    }

    private String fill() throws IOException {
        StringBuilder buffered = new StringBuilder();
        char[] temp = new char[1024];
        for (int cnt = input.read(temp); cnt > 0; cnt = input.read(temp)) {
            buffered.append(temp, 0, cnt);
        }
        String newStr = this.processPattern(buffered).toString();//+ (buffered.length() > 0 ? "" : tail);
        if (Objects.equals(newStr, "110")) {
            Stream.of(Thread.currentThread().getStackTrace()).forEach(System.out::println);
        }
        transformedInput = new StringReader(newStr);
        return newStr;
    }

    @Override
    public int read() throws IOException {
        if (transformedInput == null) {
            fill();
        }
        return transformedInput.read();
    }

    @Override
    protected int correct(int currentOff) {
        return Math.max(0, super.correct(currentOff));
    }

    /**
     * Replace pattern in input and mark correction offsets.
     */
    CharSequence processPattern(CharSequence input) {
        return input;
    }
}

配置文件

在 resources 目录下新增两个配置文件

  • plugin-descriptor.properties,这里使用 maven-assembly-plugin 插件,文件里变量定义在pom中
classname=${elasticsearch.plugin.classname}
name=${elasticsearch.plugin.name}
description=${project.description}
version=${project.version}
elasticsearch.version=${elasticsearch.version}
java.version=${maven.compiler.target}
  • plugin-security.policy 这个是插件申请权限的配置。以下是示例,根据自己的实际情况设置
grant {
  permission java.security.AllPermission;
};

在assembly目录新增打包配置文件,打包成zip文件

<?xml version="1.0"?>
<assembly>
    <dependencySets>
        <dependencySet>
            <outputDirectory>enc-filter</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <useTransitiveFiltering>true</useTransitiveFiltering>
        </dependencySet>
    </dependencySets>

    <fileSets>
        <fileSet>
            <directory>${project.basedir}/config</directory>
            <outputDirectory>config</outputDirectory>
        </fileSet>
    </fileSets>
    <files>
        <file>
            <fileMode>0755</fileMode>
            <filtered>true</filtered>
            <outputDirectory>enc-filter/</outputDirectory>
            <source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
        </file>
        <file>
            <fileMode>0755</fileMode>
            <filtered>true</filtered>
            <outputDirectory>enc-filter/</outputDirectory>
            <source>${project.basedir}/src/main/resources/plugin-security.policy</source>
        </file>
    </files>
    <formats>
        <format>zip</format>
    </formats>

    <id>plugin-develop</id>
    <includeBaseDirectory>false</includeBaseDirectory>
</assembly>

目录结构如下
在这里插入图片描述

打包

注意:plugin-descriptor.properties和plugin-security.policy 不能 打进zip包

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>enc-plugin</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>14</maven.compiler.source>
        <maven.compiler.target>14</maven.compiler.target>
        <elasticsearch.version>7.10.1</elasticsearch.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <elasticsearch.plugin.classname>com.xx.plugin.es.enc.EncPlugin</elasticsearch.plugin.classname>
        <elasticsearch.plugin.name>enc_plugin</elasticsearch.plugin.name>
        <project.description>this is a test</project.description>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>${elasticsearch.version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.6</version>
                <configuration>
                    <appendAssemblyId>false</appendAssemblyId>
                    <descriptors>
                        <descriptor>${basedir}/src/main/assembly/plugin.xml</descriptor>
                    </descriptors>
                    <outputDirectory>${project.build.directory}/releases/</outputDirectory>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>single</goal>
                        </goals>
                        <phase>package</phase>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <groupId>org.apache.maven.plugins</groupId>
                <version>3.8.1</version>
                <configuration>
                    <encoding>${project.build.sourceEncoding}</encoding>
                    <source>${maven.compiler.target}</source>
                    <target>${maven.compiler.target}</target>
                </configuration>
            </plugin>
        </plugins>
        <resources>
            <resource>
                <directory>src/main/resources</directory>
                <excludes>
                    <exclude>*.properties</exclude>
                    <exclude>*.policy</exclude>
                </excludes>
                <filtering>false</filtering>
            </resource>
        </resources>
    </build>

</project>
安装

执行 mvn clean install 后获得 elasticsearch-encode-plugin-1.0-SNAPSHOT.zip文件

在这里插入图片描述

将文件复制到es的plugin目录,然后解压、删掉原压缩文件,最终结果如下:

在这里插入图片描述

验证

启动es

./elasticsearch-7.10.1/bin/elasticsearch

在这里插入图片描述
出现如上图,则表示加载到了自定义的filter

todo:
1、新增索引
2、写入数据

本文自定义插件demo上传到了 【gitee】 https://gitee.com/aqu415/elasticsearch-encode-plugin

参考:
https://cloud.tencent.com/developer/article/2213726

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值