solr DB 解析

最新推荐文章于 2024-10-04 09:56:25 发布

jay38301205419850523

最新推荐文章于 2024-10-04 09:56:25 发布

阅读量131

点赞数

文章标签： Solr 正则表达式 Cache

本文链接：https://blog.csdn.net/jay38301205419850523/article/details/83851884

版权

solr DB解析提供了不少pugin的东东，比如regexTransform，功能还是比较丰富。
譬如用到正则表达式方式解析字段，带分隔符解析字段集合。


<entity name="foo" transformer="RegexTransformer"
query="select full_name , emailids from foo"/>
... />
   <field column="full_name"/>
   <field column="firstName" regex="Mr(\w*)\b.*" sourceColName="full_name"/>
   <field column="lastName" regex="Mr.*?\b(\w*)" sourceColName="full_name"/>

   <!-- another way of doing the same -->
   <field column="fullName" regex="Mr(\w*)\b(.*)" groupNames="firstName,lastName"/>
   <field column="mailId" splitBy="," sourceColName="emailids"/>
</entity>

虽然这样，它还是毕竟有限，不过他提供Transform 的重写，看看它的代码,它可以实现很复杂的记录处理：


public class RegexTransformer extends Transformer
{

    public RegexTransformer()
    {
        PATTERN_CACHE = new HashMap();
    }

    public Map transformRow(Map row, Context context)
    {
    }
}

尽管可以子扩展，不过它有个瓶颈，它对数据处理时针对一条记录解析完还是一条记录，不能支持1：n模式，一些复杂的运用不好解析，呵呵，至少我们项目遇到了。解决方法，我暂时想到用db字段做冗余来解决，还有人知道如何处理么？