solr debugQuery的打分分析

q:pro_name:Evod AND pro_brand:53

得到结果为


"7": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 6) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=6,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 6, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=6)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 6) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=6,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 6, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=6)\n",
"69": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 68) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=68,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 68, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=68)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 68) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=68,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 68, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=68)\n",
"873": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 872) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=872,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 872, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=872)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 872) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=872,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 872, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=872)\n",
"874": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 873) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=873,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 873, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=873)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 873) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=873,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 873, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=873)\n",
"875": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 874) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=874,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 874, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=874)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 874) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=874,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 874, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=874)\n",
"876": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 875) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=875,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 875, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=875)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 875) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=875,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 875, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=875)\n",
"877": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 876) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=876,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 876, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=876)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 876) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=876,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 876, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=876)\n",
"878": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 877) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=877,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 877, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=877)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 877) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=877,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 877, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=877)\n",
"879": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 878) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=878,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 878, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=878)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 878) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=878,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 878, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=878)\n",
"880": "\n4.6345463 = sum of:\n  2.4654682 = weight(pro_name:evod in 879) [ClassicSimilarity], result of:\n    2.4654682 = score(doc=879,freq=1.0), product of:\n      0.80325437 = queryWeight, product of:\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.0693493 = fieldWeight in 879, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        4.910959 = idf(docFreq=187, maxDocs=9390)\n        0.625 = fieldNorm(doc=879)\n  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 879) [ClassicSimilarity], result of:\n    2.1690784 = score(doc=879,freq=1.0), product of:\n      0.5956361 = queryWeight, product of:\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        0.16356365 = queryNorm\n      3.6416166 = fieldWeight in 879, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        3.6416166 = idf(docFreq=668, maxDocs=9390)\n        1.0 = fieldNorm(doc=879)\n"

现在我只取一条来分析

4.6345463 = sum of:
  2.4654682 = weight(pro_name:evod in 6) [ClassicSimilarity], result of:
  2.4654682 = score(doc=6,freq=1.0), product of:
  0.80325437 = queryWeight, product of:
  4.910959 = idf(docFreq=187, maxDocs=9390)
  0.16356365 = queryNorm
  3.0693493 = fieldWeight in 6, product of:
  1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
  4.910959 = idf(docFreq=187, maxDocs=9390)
  0.625 = fieldNorm(doc=6)
  
  2.1690784 = weight(pro_brand:`\b\u0000\u0000\u00005 in 6) [ClassicSimilarity], result of:
  2.1690784 = score(doc=6,freq=1.0), product of:
  0.5956361 = queryWeight, product of:
  3.6416166 = idf(docFreq=668, maxDocs=9390)
  0.16356365 = queryNorm
  3.6416166 = fieldWeight in 6, product of:
  1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
  3.6416166 = idf(docFreq=668, maxDocs=9390)
  1.0 = fieldNorm(doc=6)

一看就明白这个默认使用的相似度类是ClassicSimilarity,不明白的可以去

  org.apache.lucene.search.similarities.ClassicSimilarity看下源码

总打分计算

具体到上面的测试来讲,每个文档有两个域:pro_name和pro_brand,最终匹配分值=查询语句在两个域中的得分之和。即最终结果4.6345463= 2.4654682 + 2.1690784。

 

每个域的打分计算

先说pro_name=Evod的得分

field的score得分=域权重fieldWeight * 查询权重queryWeight

2.4654682 = 0.80325437 * 3.0693493;

 

queryWeight的计算

这个数值的计算在TFIDFSimilarity

public void normalize(float queryNorm, float boost) {
            this.boost = boost;
            this.queryNorm = queryNorm;
            this.queryWeight = queryNorm * boost * this.idf.getValue();
            this.value = this.queryWeight * this.idf.getValue();
        }

这样一看就明白

queryWeight = queryNorm * boost * idf ,而lucene中boost默认为1

0.80325437 = 4.910959 * 1 * 0.16356365


idf的计算

idf是项在倒排文档中出现的频率,是在ClassicSimilarity,计算方式为

public float idf(long docFreq, long numDocs) {
        return (float)(Math.log((double)numDocs / (double)(docFreq + 1L)) + 1.0D);
}

Math.log(number)这个方法相当于数学的ln(number)

 

docFreq是根据指定关键字进行检索,检索到的Document的数量,我们测试的docFreq=187;numDocs是指索引文件中总共的Document的数量,我们测试的numDocs=9390。
 

queryNorm的计算

计算方法在ClassicSimilarity

public float queryNorm(float sumOfSquaredWeights) {
        return (float)(1.0D / Math.sqrt((double)sumOfSquaredWeights));
}

 

Math.sqrt(number)这个方法相当于数学的√number

 

这里,sumOfSquaredWeights的计算是在org.apache.lucene.search.TermQuery.TermWeight类中的sumOfSquaredWeights方法实现:

public float sumOfSquaredWeights() {
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;          // square it
}

其实默认情况下,sumOfSquaredWeights = idf * idf,因为Lucune中默认的boost = 1.0。

sumOfSquaredWeights = 4.910959 * 4.910959 + 3.6416166 * 3.6416166 = 37.3789
然后计算queryNorm = 1.0D / Math.sqrt(37.3789) = 0.1635

 

fieldWeight的计算

在org/apache/lucene/search/similarities/TFIDFSimilarity.java的explainField方法中有

private Explanation explainField(int doc, Explanation freq, TFIDFSimilarity.IDFStats stats, NumericDocValues norms) {
        Explanation tfExplanation = Explanation.match(this.tf(freq.getValue()), "tf(freq=" + freq.getValue() + "), with freq of:", new Explanation[]{freq});
        Explanation fieldNormExpl = Explanation.match(norms != null?this.decodeNormValue(norms.get(doc)):1.0F, "fieldNorm(doc=" + doc + ")", new Explanation[0]);
        return Explanation.match(tfExplanation.getValue() * stats.idf.getValue() * fieldNormExpl.getValue(), "fieldWeight in " + doc + ", product of:", new Explanation[]{tfExplanation, stats.idf, fieldNormExpl});
    }

 

fieldWeight = tf * idf * fieldNorm
tf和idf的计算参考前面的,fieldNorm的计算在索引的时候确定了,此时直接从索引文件中读取,这个方法并没有给出直接的计算。如果使用ClassicSimilarity的话,它实际上就是lengthNorm,域越长的话Norm越小,在org/apache/lucene/search/similarities/ClassicSimilarity.java里面有关于它的计算:

    public float lengthNorm(FieldInvertState state) {
        int numTerms;
        if(this.discountOverlaps) {
            numTerms = state.getLength() - state.getNumOverlap();
        } else {
            numTerms = state.getLength();
        }
 
        return state.getBoost() * (float)(1.0D / Math.sqrt((double)numTerms));
    }

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值