ik分词 php,lucene IK分词器 同义词

该博客介绍了如何在Lucene中使用IKAnalyzer 5.x实现中文分词,并结合SynonymFilter进行同义词扩展。通过设置参数,加载同义词库,实现了对输入文本的智能分词和同义词替换,如将'女鞋'、'女靴'、'靴子'等视为同义词。
摘要由CSDN通过智能技术生成

public class IKSynonymsAnalyzer5x extends Analyzer {

@Override

protected TokenStreamComponents createComponents(String fieldName) {

IKTokenizer5x tokenizer5x = new IKTokenizer5x(true);

Map paramsMap=new HashMap();

paramsMap.put("luceneMatchVersion", "LUCENE_11");

paramsMap.put("synonyms", "luceneIndexCreate/synonyms.txt");

paramsMap.put("expand", "true");

SynonymFilterFactory factory=new SynonymFilterFactory(paramsMap);

ClasspathResourceLoader loader = new ClasspathResourceLoader();

try {

factory.inform(loader);

} catch (IOException e) {

e.printStackTrace();

}

return new TokenStreamComponents(tokenizer5x, factory.create(tokenizer5x));

}

}

public class IKTokenizer5x extends Tokenizer {

private IKSegmenter _IKImplement;

private final CharTermAttribute termAtt = (CharTermAttribute)this.addAttribute(CharTermAttribute.class);

private final OffsetAttribute offsetAtt = (OffsetAttribute)this.addAttribute(OffsetAttribute.class);

private final TypeAttribute typeAtt = (TypeAttribute)this.addAttribute(TypeAttribute.class);

private int endPosition;

public IKTokenizer5x() {

this._IKImplement = new IKSegmenter(this.input, true);

}

public IKTokenizer5x(boolean useSmart) {

this._IKImplement = new IKSegmenter(this.input, useSmart);

}

public IKTokenizer5x(AttributeFactory factory) {

super(factory);

this._IKImplement = new IKSegmenter(this.input, true);

}

public boolean incrementToken() throws IOException {

this.clearAttributes();

Lexeme nextLexeme = this._IKImplement.next();

if(nextLexeme != null) {

this.termAtt.append(nextLexeme.getLexemeText());

this.termAtt.setLength(nextLexeme.getLength());

this.offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());

this.endPosition = nextLexeme.getEndPosition();

this.typeAtt.setType(nextLexeme.getLexemeTypeString());

return true;

} else {

return false;

}

}

public void reset() throws IOException {

super.reset();

this._IKImplement.reset(this.input);

}

public final void end() {

int finalOffset = this.correctOffset(this.endPosition);

this.offsetAtt.setOffset(finalOffset, finalOffset);

}

}

luceneIndexCreate/synonyms.txt

女鞋,女靴

靴子,长靴,短靴

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值