2.4、搜索查询对象
2.4.1.2、创建Weight对象树
BooleanQuery.createWeight(Searcher) 最终返回return new BooleanWeight(searcher),BooleanWeight构造函数的具体实现如下:
public BooleanWeight(Searcher searcher) { this.similarity = getSimilarity(searcher); weights = new ArrayList<Weight>(clauses.size()); //也是一个递归的过程,沿着新的Query对象树一直到叶子节点 for (int i = 0 ; i < clauses.size(); i++) { weights.add(clauses.get(i).getQuery().createWeight(searcher)); } } |
对于TermQuery的叶子节点,其TermQuery.createWeight(Searcher) 返回return new TermWeight(searcher)对象,TermWeight构造函数如下:
public TermWeight(Searcher searcher) { this.similarity = getSimilarity(searcher); //此处计算了idf idfExp = similarity.idfExplain(term, searcher); idf = idfExp.getIdf(); } |
//idf的计算完全符合文档中的公式:
public IDFExplanation idfExplain(final Term term, final Searcher searcher) { final int df = searcher.docFreq(term); final int max = searcher.maxDoc(); final float idf = idf(df, max); return new IDFExplanation() { public float getIdf() { return idf; }}; } |
public float idf(int docFreq, int numDocs) { return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0); } |
而ConstantScoreQuery.createWeight(Searcher) 除了创建ConstantScoreQuery.ConstantWeight(searcher)对象外,没有计算idf。
由此创建的Weight对象树如下:
weight BooleanQuery$BooleanWeight (id=169) | | //ConstantScore(contents:apple*) | //contents:boy | | //ConstantScore(contents:cat*) | //contents:dog | | //contents:eat | //contents:cat^0.33333325 //contents:foods |
2.4.1.3、计算Term Weight分数
(1) 首先计算sumOfSquaredWeights
按照公式:
代码如下:
float sum = weight.sumOfSquaredWeights();
//可以看出,也是一个递归的过程 public float sumOfSquaredWeights() throws IOException { float sum = 0.0f; for (int i = 0 ; i < weights.size(); i++) { float s = weights.get(i).sumOfSquaredWeights(); if (!clauses.get(i).isProhibited()) sum += s; } sum *= getBoost() * getBoost(); //乘以query boost return sum ; } |
对于叶子节点TermWeight来讲,其TermQuery$TermWeight.sumOfSquaredWeights()实现如下:
public float sumOfSquaredWeights() { //计算一部分打分,idf*t.getBoost(),将来还会用到。 queryWeight = idf * getBoost(); //计算(idf*t.getBoost())^2 return queryWeight * queryWeight; } |
对于叶子节点ConstantWeight来讲,其ConstantScoreQuery$ConstantWeight.sumOfSquaredWeights() 如下:
public float sumOfSquaredWeights() { //除了用户指定的boost以外,其他都不计算在打分内 queryWeight = getBoost(); return queryWeight * queryWeight; } |
(2) 计算queryNorm
其公式如下:
其代码如下:
public float queryNorm(float sumOfSquaredWeights) { return (float)(1.0 / Math.sqrt(sumOfSquaredWeights)); } |
(3) 将queryNorm算入打分
代码为:
weight.normalize(norm);
//又是一个递归的过程 public void normalize(float norm) { norm *= getBoost(); for (Weight w : weights) { w.normalize(norm); } } |
其叶子节点TermWeight来讲,其TermQuery$TermWeight.normalize(float) 代码如下:
public void normalize(float queryNorm) { this.queryNorm = queryNorm; //原来queryWeight为idf*t.getBoost(),现在为queryNorm*idf*t.getBoost()。 queryWeight *= queryNorm; //打分到此计算了queryNorm*idf*t.getBoost()*idf = queryNorm*idf^2*t.getBoost()部分。 value = queryWeight * idf; } |
我们知道,Lucene的打分公式整体如下,到此计算了图中,红色的部分: