[分组聚合]基于Lucene8进行多值字段分组聚合(多属性字段)

最新推荐文章于 2024-05-03 21:05:18 发布

alexgaoyihang

最新推荐文章于 2024-05-03 21:05:18 发布

阅读量226

点赞数 3

文章标签： lucene 分组聚合多属性多值字段聚合逗号分割

本文链接：https://blog.csdn.net/alexgaoyihang/article/details/137538781

版权

介绍

在使用搜索引擎的过程中，经常会出现针对属性进行分组聚合的场景，单属性分组聚合很简单，如何对多属性字段进行分组聚合是本文的重点。

背景

在真实世界中，任意一个物品的属性可能会有多个，形如学生有多个兴趣爱好，如何根据兴趣爱好进行分组？如果高效的解决类似的需求尤为重要。

解决方案

1、仍然按照单字段单属性分组聚合的方式，对分组聚合的结果进行循环处理，从而获得期望的结果。

2、引入 lucene-facet 对多属性字段进行分组统计。本文主要介绍此方法。

实现代码

详见如下测试代码

其中针对 Category 字段分组查询之后得到的结果为：gao : 2 ; alex,gao : 1 ; alex,pap : 1

其中针对 MCategory 字段分组查询之后得到的结果为：gao : 3 ; alex : 2 ; pap : 1

@RunWith(SpringRunner.class)
@SpringBootTest(classes = {com.pap.lucene.LuceneAutoConfiguration.class})
@TestPropertySource("classpath:application.properties")
@FixMethodOrder(MethodSorters.JVM)
public class LuceneAutoConfigurationTest {

    @Autowired
    private SearcherManager searcherManager;
    @Autowired
    private IndexWriter indexWriter;
    @Autowired
    private DirectoryTaxonomyWriter taxoWriter;

    @Test
    public void init() throws Exception {
        FacetsConfig facetsConfigInit = new FacetsConfig();
        facetsConfigInit.setMultiValued("MCategory", true);

        indexDocument(indexWriter, taxoWriter, facetsConfigInit, "pap.net.cn 1", "pap.net.cn of content 1", "alex,gao", "Tag A");
        indexDocument(indexWriter, taxoWriter, facetsConfigInit, "pap.net.cn 2", "pap.net.cn of content 2", "alex,pap", "Tag B");
        indexDocument(indexWriter, taxoWriter, facetsConfigInit, "pap.net.cn 3", "pap.net.cn of content 3", "gao", "Tag A");
        indexDocument(indexWriter, taxoWriter, facetsConfigInit, "pap.net.cn 4", "pap.net.cn of content 4", "gao", "Tag B");

        indexWriter.commit();
        taxoWriter.commit();

    }

    @Test
    public void search() throws Exception {
        Query query = new TermQuery(new Term("content", "content"));

        searcherManager.maybeRefresh();
        IndexSearcher indexSearcher = searcherManager.acquire();

        FacetsCollector facetsCollector = new FacetsCollector();
        FacetsCollector.search(indexSearcher, query, 10, facetsCollector);

        FacetsConfig facetsConfigSearch = new FacetsConfig();
        Facets facets = new FastTaxonomyFacetCounts(new DirectoryTaxonomyReader(taxoWriter), facetsConfigSearch, facetsCollector);

        FacetResult category = facets.getTopChildren(10, "Category");
        if(category != null) {
            for(LabelAndValue labelValues : category.labelValues) {
                System.out.println(labelValues.label + " : " + labelValues.value);
            }
        }
        System.out.println("------");
        FacetResult MCategory = facets.getTopChildren(10, "MCategory");
        if(MCategory != null) {
            for(LabelAndValue labelValues : MCategory.labelValues) {
                System.out.println(labelValues.label + " : " + labelValues.value);
            }
        }
    }

    private static void indexDocument(IndexWriter indexWriter, DirectoryTaxonomyWriter taxoWriter, FacetsConfig facetsConfig, String title, String content, String category, String tag) throws IOException {
        Document doc = new Document();
        doc.add(new StringField("title", title, Field.Store.YES));
        doc.add(new TextField("content", content, Field.Store.YES));
        doc.add(new FacetField("Category", category));
        for(String tmp :category.split(",")) {
            doc.add(new FacetField("MCategory", tmp));
        }
        doc.add(new FacetField("Tag", tag));
        indexWriter.addDocument(facetsConfig.build(taxoWriter, doc));
    }
}

总结

此代码在Lucene8.6.2版本下通过测试，从而高效的获取到单字段多属性的分组聚合结果。

参考

https://pap-docs.pap.net.cn/
https://gitee.com/alexgaoyh/pap-lucene-spring-boot-starter

alexgaoyihang

关注

3
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
[分组聚合]基于Lucene8进行多值字段分组聚合(多属性字段)

在使用搜索引擎的过程中，经常会出现针对属性进行分组聚合的场景，单属性分组聚合很简单，如何对多属性字段进行分组聚合是本文的重点。此代码在Lucene8.6.2版本下通过测试，从而高效的获取到单字段多属性的分组聚合结果。
复制链接

扫一扫