Lucene: Boosting documents and fields

最新推荐文章于 2024-08-07 21:56:36 发布

ylzhjlinux

最新推荐文章于 2024-08-07 21:56:36 发布

阅读量131

点赞数

分类专栏： Lucene 文章标签：人工智能 c/c++

本文链接：https://blog.csdn.net/ylzhjlinux/article/details/84611725

版权

Lucene 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Not all documents and fields are created equal—or at least you can make sure that’s the case by using boosting. Boosting may be done during indexing, as we describe here, or during searching. Search-time boosting is more dynamic, because every search can separately choose to boost or not to boost with dif-
ferent factors, but also may be somewhat more CPU intensive. Because it’s so dynamic, search-time boosting also allows you to expose the choice to the user, such as a check-box that asks “Boost recently modified documents?”.

Field subjectField = new Field("subject", subject,
Field.Store.YES,
Field.Index.ANALYZED);
subjectField.setBoost(1.2F);

Norms

During indexing, all sources of index-time boosts are combined into a single floating-point number for each indexed field in the document. The document may have its own boost; each field may have a boost; and Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost). These boosts are combined and then compactly encoded (quantized) into a single byte,
which is stored per field per document. During searching, norms for any field being searched are loaded into memory, decoded back into a floating-point number, and used when computing the relevance score.

One problem often encountered with norms is their high memory usage at search time. This is because the full array of norms, which requires one byte per document per separate field searched, is loaded into RAM. For a large index with many fields per document, this can quickly add up to a lot of RAM. Fortunately, you can easily turn norms off by either using one of the NO_NORMS indexing options in Field.Index or by calling Field.setOmitNorms(true) before indexing the document containing that
field. Doing so will potentially affect scoring, because no index-time boost information will be used during searching, but it’s possible the effect is trivial, especially when the fields tend to be roughly the same length and you’re not doing any boosting on your own.

Beware: if you decide partway through indexing to turn norms off, you must rebuild the entire index because if even a single document has that field indexed with norms enabled, then through segment merging this will “spread” so that all documents consume one byte even if they’d disabled norms. This happens because Lucene doesn’t use sparse storage for norms.

ylzhjlinux

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lucene: Boosting documents and fields

Not all documents and fields are created equal—or at least you can make sure that’s the case by using boosting. Boosting may be done during indexing, as we describe here, or during searching. Search-...
复制链接

扫一扫

专栏目录