【提醒】


这个page是个人汇总了maillist、自己在搜索平台化、通用化过程中遇到的种种需求,为了避开必要的“敬业竞争禁止等”,特地从外网搜罗并汇总代表性的需求。构成基于solr搜索“策略”参考、搜索应用查询的方案参考,但是,性能问题特别是高级用法,在大数据量时,务必压测,做到心里有底。


这里面给出的方法绝大部分基于solr接口、配置。不针对深入定制的详细说明。针对深入定制的经验,这里找不到答案,有兴趣私下交流。


整个汇总抛砖引入,各个点没有做系统、全面的论证,内容基本来自网络,总体方向和大点没有问题。如果发现细处不对,也请指出。谢谢!


目录

13.4.0得分的问题

1

2
配置方法


1

3
问题和需求


3

4Payload问题

3

5Customsort(score+customvalue)
4

6BoostQParserPlugin.

4

7howcanIlimitbyscorebeforesortingina
solrquery.

6

8Score
filter
.

7

9Boostscoreforearly
matches
.

10

10Solr:HowcanIgetalldocumentsorderedby
scorewithalistofkeywords?.

11

11Solrchangesdocument’sscorewhenitsrandom
fieldvaluealtered.

13

12Relevance
Customization
.

15

13ModifySOLRscoring.

15

14Changeorderbefore
returningdata
.

16

15limitingthe
totalnumberofdocumentsmatched.

17

3.4.0
得分的问题

(7)
得分因子是可以调整的,但是得分因子的增加、得分公式的扩展,无法直接从
solr配置插入。—-但是,可以扩展lucene的代码或者参数
spanquery,重新一个query,插入solr,这样工作量稍大.另外,社区提供了bm25pagerank等排序batch,对lucene
有所以了解后,就可以直接引用了。

(16)
在排序上,对与去重或者对应基于时间动态性上,还没有现成的支持。去重是指排序的前几条结果,可能某个域值完全相同了,或者某几个域值完全相同,导致看起来,靠前的结果带有一些关联字段的
聚集性,对有些应用来说,并不是最好的。

在时间因素上动态性,也没有直接支持,也只能靠间接的按时间排序来实现。

这个问题其实不是lucenesolr要关注的吧,应该是应用的特殊性导致的吧。


配置方法


全局配置
schema.xml

Similarity

A(global)declarationcanbeusedtospecifya
customSimilarityimplementationthatyouwantSolrtousewhen
dealingwithyourindex.ASimilaritycanbespecifiedeitherby
referringdirectlytothenameofaclasswithano-arg
constructor…

<similarity
class=”org.apache.lucene.search.similarities.DefaultSimilarity”/>

…orbyreferencinga
SimilarityFactoryimplementation,whichmaytake
optionalinitparams….

<similarity
class=”solr.DFRSimilarityFactory”>

<str
name=”basicModel”>P</str>

<str
name=”afterEffect”>L</str>

<str
name=”normalization”>H2</str>

<float
name=”c”>7</float>

</similarity>

BeginingwithSolr4.0,Similarity
factoriessuchas
SchemaSimilarityFactory
canalsosupportspecifyingspecific
Similarityimplementationsonindividualfieldtypes…

<types>


<fieldTypename=”text_dfr”
class=”solr.TextField”>


<analyzer
class=”org.apache.lucene.analysis.standard.StandardAnalyzer”/>


<similarity
class=”solr.DFRSimilarityFactory”>


<str
name=”basicModel”>I(F)</str>


<str
name=”afterEffect”>B</str>


<str
name=”normalization”>H2</str>


</similarity>


</fieldType>


<fieldTypename=”text_ib”
class=”solr.TextField”>


<analyzer
class=”org.apache.lucene.analysis.standard.StandardAnalyzer”/>


<similarity
class=”solr.IBSimilarityFactory”>


<str
name=”distribution”>SPL</str>


<str
name=”lambda”>DF</str>


<str
name=”normalization”>H2</str>


</similarity>


</fieldType>

</types>

<similarity
class=”solr.SchemaSimilarityFactory”/>

Ifno(global)isconfiguredintheschema.xmlfile,
animplicitinstanceof
DefaultSimilarityFactory
isused.


问题和需求

By
DefaultComputerValue

ByCustomScore,By
DefaultComputerValue

CustomScore*fa+
DefaultComputerValue*fb

Doc11010010*0.8+
100*0.2=28

Doc2199
1*0.8+99*0.2=20.6

Doc3398
3*0.8+98*0.2=22

Doc42050
20*0.8+50*0.2=36

Solr3.4.0
得分代码分析

abstractclass
SimilarityFactory


成员变量
publicabstract
SimilaritygetSimilarity();

Payload问题

http://wiki.apache.org/lucene-java/Payloads

Scoringpayloadsinvolves
overridingtheSimilarity.scorePayload()method.Forexample,if
onehasimplementedstoringaFloatpayload,itcouldbeusedfor
scoringinthefollowingway:

<span lang="EN-US"><span>&nbsp; </span>public float scorePayload(byte [] payload, int offset, int length) {</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>assert length == 4;</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>int accum = ((payload[0+offset]&amp;0xff)) |</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>((payload[1+offset]&amp;0xff)&lt;&lt;8) |</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>((payload[2+offset]&amp;0xff)&lt;&lt;16)<span>&nbsp; </span>|</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>((payload[3+offset]&amp;0xff)&lt;&lt;24);</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>return Float.intBitsToFloat(accum);</span>
<span lang="EN-US"><span>&nbsp; </span>}</span>

Don’tforgettoactivate
yourSimilarityimplementationusingIndexSearcher.setSimilarity().
Also,notethateventhennotallquerieswillactuallymakeuseof
yourmethod.Forexample,youwillneedtouseBoostingTermQuery
insteadofTermQuery.QueryParsercurrently(Lucene2.3.2)always
usesTermQueryandyouwillneedtoextendQueryParserand
overwritegetFieldQuery().

Note,thatisjustone
possiblewayofscoringapayload.Payloadsareapplication
specific.ForexamplepayloadTokenFiltersseethepayloadpackage
inthecontrib/Analysismodule.

Customsort(score+custom
value)

http://grokbase.com/t/lucene/solr-user/08b25j6ked/custom-sort-score-custom-value

Hi,

Iwanttoimplementacustomsortin
Solrbasedonacombinationofrelevance(Solrgivesmeityet
=>score)andacustomvalueI’vecalculated
previouslyforeachdocument.Iseetwooptions:

1.Useafunctionquery(I’musinga
DisMaxRequestHandler).
2.CreateacomponentthatsetSortSpecwithasortthathasa
custom
ComparatorSource(similartoQueryElevationComponent).

Thefirstoptionhastheproblem:
Whiletherelevancevaluechangesfor
everyquery,mycustomvalueisconstantforeachdoc.Itimplies
queries
withdocumentsthathavehighrelevancearelessaffectedwithmy
custom
value.Ontheotherhand,querieswithlowrelevanceareaffecteda
lotwithmycustomvalue.Canitbeproportionalwithafunction
query?(i.e.docswithlowrelevancearelessaffectedbymycustom
value).

Thesecondoptionhastheproblem:
Solrscoreisn’tnormalized.Ineeditnormalizedinordertoapply
mycustomvalueinthesortValuefunctionin
ScoreDocComparator.Whatdoyouthink?What’sthebestoptionin
thatcase?Anotheroption?

Thankyouinadvance,

George

BoostQParserPlugin

http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/search/BoostQParserPlugin.html

org.apache.solr.search

Class
BoostQParserPlugin

http://stackoverflow.com/questions/3035831/solr-lucene-scorer

Scorerarepartsoflucene
Queriesviathe‘weight’querymethod.

Inshort,theframework
callsQuery.weight(..).scorer(..).Havealookat

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html

http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html

TouseyourownQueryclass
inSolr,you’llneedtoimplementyourownsolrQueryParserPlugin
thatusesyourownQParserthatgeneratesyourpreviously
implementedluceneQuery.YouthencanuseitinSolrspecified
here:

http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

Thispartonimplementation
shouldstaysimpleasthisisjustsomeglueingcode.

Enjoyhacking
Solr!

share|improvethis
answer

answeredJun14’10at
10:33

Youcanoverridethelogic
solrscoreruses.SolrusesDefaultSimilarityclassforscoring.1)
makeaclassextendingDefaultSimilarity.2)overridethefunctions
tf(),idf()etcaccordingtoyourneed.

public class
CustomSimilarity extends DefaultSimilarity {
public CustomSimilarity()
{
super();
}
public float tf(int
freq) {
//your
code
return (float)
1.0;
}
public float idf(int
docFreq, int numDocs) {
//your code
return (float)
1.0;
}
}

3)Aftercreatingaclass
compileandmakeajar.4)putthejarinlibfolderof
correspondingindexorcore.5)Changetheschema.xmlof
correspondingindex.CustomSimilarity”/>

Youcancheckoutvarious
factorsaffectingscorehere

Foryourrequirementyoucan
createbucketsifyourscoreisinspecificrange.Alsoreadabout
fieldboosting,documentboostingetc.Thatmightbehelpfulin
yourcase.

http://stackoverflow.com/questions/11748487/how-can-i-filter-solr-results-by-custom-score


HowcanIfilterSOLRresultsbycustomscore

I’musingsolrfunction
queriestogeneratemyowncustomscore.Iachievethisusing
somethingalongtheselines:

<code><span lang="EN-US">q=_val_:"my_custom_function()"</span></code>

Thispopulatesthescore
fieldasexpected,butitalsoincludesdocumentsthatscore0.I
needawaytofiltertheresultssothatscoresbelowzeroarenot
included.

IrealizethatI’musing
scoreinanon-standardwayandthatnormallythescorethat
lucene/solrproduceisnotabsolute.However,producingmyown
scoreworksreallywellformyneeds.

I’vetriedusing{!frange
l=0}butthiscausesthescoreforalldocumentstobe
“1.0″.

Isuspectpseudo-fields
couldbeused,butsincesolr4isstillalpha,I’mlookingfora
waytodoitusingSolr3.1.


howcanIlimitbyscore
beforesortinginasolrquery

Iamsearching“product
documents”.Inotherwords,mysolrdocumentsareproductrecords.
Iwanttogetsaythetop50matchingproductsforaquery.ThenI
wanttobeabletosortthetop50scoringdocumentsbynameor
price.I’mnotseeingmuchonhowtodothis,sincesortingby
score,thenbynameorpricewon’treallyhelp,sincescoresare
floats.

Iwouldn’tmindifIcould
dosomethinglikemapthescorestoranges(likeascoreof
8.0-8.99wouldgointhe8bucketscore),thensortbyrange,then
bynames,butsincethereisbasicallynonormalizationtoscoring,
thiswouldstillmakethingsabitharder.

Tl;drHowdoIexcludelow
scoringdocumentsfromthesolrresultsetbeforesorting?solr
scoring

share|improvethis
question

askedDec7’10at
22:21

3
Answers

Youcanusefrangeto
achievethis,aslongasyoudon’twanttosortonscore(inwhich
caseIguessyoucouldjustdothefilteringontheclientside).
Yourquerywouldbesomethingalongthelinesof:

q={!frange
l=5}query($qq)&qq=[awesome
product]&sort=priceasc

Setthelargumentinthe
q-frange-parametertothelowerboundyouwanttofilterscoreon,
andreplacetheqqparameterwithyouruserquery.

answeredDec8’10at
10:23

KarlJohansson

1,046310

thanks,sinceIcangeta
reasonablefrangefromthefirsttimetheresultsaredisplayed
sortedbyscorealone,thisworksgreat!–ZakDec9’10at
18:40

Idon’tthinkyoucansimply
excludelowscoringdocumentsfromthesolrresultsetbefore
sorting

becausetherelevancescore
isonlymeaningfulforagivencombinationofsearchqueryand
resultingdocumentlist.I.e.scoresareonlymeaningfulwithina
givensearchandyoucannotsetsomethresholdforall
searches.

IfyouwereusingJava(or
PHP)youcouldgetthetop50documentsandthenre-sortthislist
inyourprogramminglanguagebutIdon’tthinkyoucandoitwith
justSOLR.

Anyway,Iwouldrecommend
youdon’tgodownthisrouteofre-sortingtheresultsfromSOLR,
asitwillsimplyconfusetheuser.Peopleexpectsearchresultsto
belikeGoogle(andmostothersearchengines),whereresultscome
backinsomeformofTFIDFranking.

Havingsaidthat,youcould
usesomeothercriteriatoseparatedocumentswiththesame
relevancescoresbyaddinganindex-timeboostfactorbasedona
pricerangescale.

I’dsuggestyouuseSOLRto
itsstrengthsandusefacets.Provideapricerangefacetonthe
left(likeEbay,Amazon,etal.)and/oraproductcategoryfacet,
etc.Alsoprovidea“sort”widgettoallowtheresultstobesorted
byproductname,iftheuserwantsit.

[EDIT]thisquestionmight
alsobeuseful:

Digg-likesearchresult
rankingwithLucene/Solr?

AsobservedbyKarl
Johansson,youcoulddothefilteringontheclientside:loadthe
first50rowsoftheresponse(sortedbyscoredesc)andthen
manipulatetheminJSforexample.

ThejQueryDataTablesplugin
worksfantasticallyforthatkindofthing:sorting,sortingon
multiplecolumns,dynamicfiltering,etc.—andwithonly50rows
itwouldbeveryfasttoo,sothatuserscan“play”withthe
sortingandfilteringuntiltheyfindwhattheywant.

Scorefilter

http://lucene.472066.n3.nabble.com/score-filter-td493438.html

Hello,Isthereawaytosetascorefilter?Itried
“+score:[1.2TO*]”butitdidnotwork.
Manythanks,

What’sthemotivationfor
wantingtodothis?ThereasonIask,is
scoreisarelativethingdeterminedbyLucenebasedonyourindex
statistics.
Itisonlymeaningfulforcomparingtheresultsofaspecificquery
withaspecificinstanceoftheindex.Inotherwords,it
isn’tusefultofilteronb/cthereisnowayofknowingwhata
goodcutoffvaluewouldbe.So,youwon’tbeable
todoscore:[1.2TO*]becausescoreisa
notanactualField.

Thatbeingsaid,you
probablycouldimplementaHitCollectorattheLuceneleveland
somehowhookitintoSolrtodowhatyouwant.Or,ofcourse,just
stopprocessingtheresultsinyourappafteryouseeascorebelow
acertainvalue.Naturally,thisstill
meansyouhavetoretrievetheresults.

Re:scorefilter

Inmycase,forexample
searchingabook.Someofthereturneddocumentsarewithhigh
relevance(score>3),butsomeofdocumentwithlow
score(<0.01)areuseless.

Withouta“scorefilter”,I
havetogothrougheachdocumenttofindoutthenumberof
documentsI’minterested(score>nnn).Thiscauses
someproblemforpagination.ForexampleifIonly
needtodisplaythefirst10recordsIneedtoretrieveall1000
documentstofigureoutthenumberofmeaningfuldocumentswhich
havescore>nnn.

Thx,

Kevin

What’sthemotivationfor
wantingtodothis?ThereasonIask,is
scoreisarelativethingdeterminedbyLucenebasedonyourindex
statistics.
Itisonlymeaningfulforcomparingtheresultsofaspecificquery
withaspecificinstanceoftheindex.Inotherwords,it
isn’tusefultofilteronb/cthereisnowayofknowingwhata
goodcutoffvaluewouldbe.So,youwon’tbeable
todoscore:[1.2TO*]becausescoreisanotanactual
Field.

Thatbeingsaid,you
probablycouldimplementaHitCollectorattheLuceneleveland
somehowhookitintoSolrtodowhatyouwant.Or,ofcourse,just
stopprocessingtheresultsinyourappafteryouseeascorebelow
acertainvalue.Naturally,thisstill
meansyouhavetoretrievetheresults.

-Grant

Re:scorefilter

Atwhatpointdoyoudraw
theline?
0.01istoolow,butwhatabout0.5or0.3?Infact,theremaybe
querieswhere0.01isrelevant.

Relevanceisatrickything
andputtinginarbitrarycutoffsisusuallynotagoodthing.An
alternativemightbetoinsteadlookatthedifferencebetween
scoresandseeifthegapislargerthansomedelta,buteventhat
issubjecttothevagariesofscoring.

Whatkindofrelevance
testinghaveyoudonesofartocomeupwith

thosevalues?Seealso

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/

Re:scorefilter

Justdidsomeresearch.It
seemsthatit’sdoablewithadditionalcodeaddedtoSolrbutnot
outofbox.Thankyou,Grant.

Atwhatpointdoyoudraw
theline?
0.01istoolow,butwhatabout0.5or0.3?Infact,theremaybe
querieswhere0.01isrelevant.

Relevanceisatrickything
andputtinginarbitrarycutoffsisusuallynotagoodthing.An
alternativemightbetoinsteadlookatthedifferencebetween
scoresandseeifthegapislargerthansomedelta,buteventhat
issubjecttothevagariesofscoring.

Whatkindofrelevance
testinghaveyoudonesofartocomeupwiththose
values?See
also

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/

Re:scorefilter

Don’tbotherdoingthis.It
doesn’twork.

Thisseemslikeagoodidea,
somethingthatwouldbeusefulforalmosteveryLucene
installation,butitisn’tinLucenebecauseitdoesnotworkin
therealworld.

Afewproblems:

*Someuserswantevery
matchanddon’tcarehowmanypagesofresultstheylook
at.

*Someusersareverybadat
creatingqueriesthatmatchtheirinformationneeds.Othersare
merelybad,notverybad.Thegoodmatchesfortheirqueryareon
top,butthegoodmatchesfor

theirinformationneedare
onthethirdpage.

*Misspellingscanputthe
rightmatch(partialmatch)atthebottom.Ididthisyesterdayat
mylibrarysite,typeing“KatherineKerr”insteadofthecorrect
“KatharineKerr”.

Theirsearchengineshowed
nomatches(grrr),soIhadtosearchagainwith“Kerr”.

*Mostusersdonotknowhow
torepairtheirqueries,likeIdidwith“KatherineKerr”,changing
itto“Kerr”.Eveniftheydo,youshouldn’tmakethem.Justshow
theweaklyrelevantresults.

*Documentshaveerrors,
justlikequeries.Ifindbaddataonoursiteaboutonceamonth,
andwehaveprofessionaleditors.Westillhaven’tfixedourentry
for“BettyPage”toread“BettiePage”.

*Peoplemayusenon-title
wordsinthequery,likesearchingfor“batman”whentheywant“The
DarkKnight”.

So,don’tdothis.Ifyou
areforcedtodoit,makesurethatyoumeasureyoursearchquality
beforeandafteritisimplemented,becauseitwillgetworse.Then
youcanstopdoingit.

wunder

Re:scorefilter

+1.Ofcourseitis
doable,butthatdoesn’tmeanyoushould,whichiswhatIwas
tryingtosaybefore,(butwastypingonmyiPodsoitwasn’tfast)
andwhichWalterhasdoneso.Itisentirely
conceivabletomethatsomeonecouldsearchforaverycommonword
suchthatthescoreofallrelevant(andthus,“good”)documents
arebelowyourpredefinedthreshold.

Atanyrate,proceedatyour
ownperil.
Toimplementit,lookintotheSearchComponent
functionality.

Re:scorefilter

HelloGrant,

Ineedtoframeaquerythat
isacombinationoftwoquerypartsandIusea‘function’queryto
preparethesame.Somethinglike:

q={!type=funcq.op=AND
df=text}product(query($uq,0.0),query($cq,0.1))

where$uqand$cqaretwo
queries.

Now,Iwantasearchresult
returnedonlyifIgetahiton$uq.So,Ispecifydefaultvalueof
$uqqueryas0.0inorderforthefinalscoretobezeroincases
where$uqdoesn’trecordahit.Eventhough,thescoringworksas
expected(i.e,documentthatdon’tmatch$uqhaveascoreofzero),
allthedocumentsarereturnedassearchresults.Isthereawayto
filtersearchresultsthathaveascoreofzero?

Thanksforyour
help,

Debdoot

Re:scorefilter

:Ineedtoframeaquery
thatisacombinationoftwoquerypartsandIusea‘function’
querytopreparethesame.Somethinglike:

:q={!type=funcq.op=AND
df=text}product(query($uq,0.0),query($cq,0.1))

:where$uqand$cqaretwo
queries.

:

:Now,Iwantasearch
resultreturnedonlyifIgetahiton$uq.So,Ispecifydefault
valueof$uqqueryas0.0inorderforthefinalscoretobezero
incaseswhere$uqdoesn’trecordahit.Eventhough,thescoring
worksasexpected(i.e,documentthatdon’tmatch$uqhaveascore
ofzero),allthedocumentsare
returnedassearchresults.Isthereawaytofiltersearchresults
thathaveascoreofzero?

a)youcouldwrapyourquery
in{!frange}..butthatwillmakeeverything

thatdoeshavea
value>0.0getthesamefinalscore

b)youcouldusean
fq={!frange}thatrefersbacktoyouroriginal$q

c)youcouldjustuseanfq
thatrefersdirectlytoyour$uqsincethat’s

whatyousayyouactaully
wanttofilteroninthefirstplace..

uq=…

cq=…

q={!type=funcq.op=AND
df=text}product(query($uq,0.0),query($cq,0.1))

fq={!v=uq}

Boostscoreforearlymatches


Solr–Howtoboostscoreforearlymatches?

upvote1downvote
favorite

HowcanIboostthescore
fordocumentsinwhichmyquerymatchesaparticularfieldearlier.
Forexample,searchingfor“superman”shouldgive“superman
returns”ahigherscorethan“thereismysuperman”.Isthis
possible?

Uh,storethefirstfew
wordsexplicitlyinanotherfield,andboostmatchesonthisfield.
–aitchnyuAug22at9:45

Theproblemthereisthat
thesizeofthequerycanvaryfromsay3characterstosay100
characters,andsodetermininghowmanywords/charstoindex
separatelycanbedifficult.–techfoobarAug22at9:49

Secondly,supposeiindex
thefirst25characters,andonerecordhas“mysupermanblah..”
andanotherrecordhas“supermanreturnsblah..”–bothwillmatch
thequery“superman”andbothwillbeboostedwheniboostthis
secondaryfield.–techfoobarAug22at9:50

2Answers

Thankyoufortheanswer.
Butisolvedittodaybyusingtheapproachi’veoutlinedinmy
answer.–techfoobarAug22at18:33

Butthisisnotgoingto
workifthewordsdonotoccurattheverystart.Maywanttocheck
outpayloadsaswellwhereucanaddindextimesuggestionsaslaid
downinthesecondoption.–JayendraAug22at18:35

Willcheckthatoutaswell.
However,thecurrentsolutioncanbemadetoworktoalargeextent
byfinetuningthepsparametertomakeitmorelenient.I
currentlyuse2(distbetween2termsinthepf)anditseemstobe
workingquitewellformymediumsizeddataset(1000sofrecords,
greatlyvaryingincontent).Willcheckoutyourpointandletyou
knowifithelped.–techfoobarAug22at18:38

up
vote0downvoteaccepted
SolveditmyselfafterreadingaLOTaboutthisonline.What
specificallyhelpedmewasareplyonnabblewhichgoeslike(I
useddismax,soexplainingthathere):



Createaseparatefieldnamedsay‘nameString’whichstoresthe
valueas“START“



Changethesearchqueryto“START“



AddthenewfieldnameStringasoneofthefieldstolookininthe
queryfieldsparam(qf)



Whilesearchingusetheparameterpf(phrasefield)asthenew
fieldnameStringwithaphraseslopof1or2(lowervalueswould
meanstrictersearching)

Yourfinalqueryparamswill
besomethinglike:

q=_START_

defType=dismax

qf=name
nameString

pf=nameString

ps=2


Solr:HowcanIgetall
documentsorderedbyscorewithalistof
keywords?

IhaveaSolr3.1database
containingEmailswithtwofields:



datetime



text

ForthequeryIhavetwo
parameters:



dateoftoday



keywordarray(“importantthing”,“importanttoo”,“notso
important,butmorethanaverage”)

Isitpossibletocreatea
queryto

1.
getALLdocumentsofthisdayAND

2.
sortthembyrelevancybyorderingthemsothattheemailwith
containsmostofmykeywords(importantthings)scores
best?

Thepartwiththedateis
notverycomplicated:

fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]

Iknowthatyoucanboost
thekeywordsthisway:

q=text:”firstkeyword”^5OR
text:”secondone”^2ORtext:”minusscoring”^0.5OR
text:”*”

ButhowdoIonlyusethe
keywordstosortthislistandgetALLentriesinsteadofdoinga
realyqueryandgetonlyafewentriesback?

Thanksforhelp!

2Answers

Youneedtospecifyyour
termsinthemainqueryandthenchangeyourdatequerytobea
filterqueryontheseresultsbyaddingthefollowing.

fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]

Soyoushouldhavesomething
likethis:

q=&fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]

Edit:Alittlemoreabout
filterqueries(assuggestedbyrfreak).

FromSolrWiki–FilterQuery
Guidance–“Now,whatisafilterquery?Itissimplyapartofa
querythatisfactoredoutforspecialtreatment.Thisisachieved
inSolrbyspecifyingitusingthefq(filterquery)parameter
insteadoftheq(mainquery)parameter.Thesameresultcouldbe
achievedleavingthatquerypartinthemainquery.Thedifference
willbeinqueryefficiency.That’sbecausetheresultofafilter
queryiscachedandthenusedtofilteraprimaryqueryresult
usingsetintersection.”

Theseshouldbesortedby
relevancyscorealready,thatisjustthedefaultbehaviorofSolr.
Youcanseethescorebyaddingthatfield.

fl=*,score

IfyouusetheFull
InterfaceforMakeAQueryontheAdminInterfaceonyourSolr
installationathttp:admin/form.jspyouwillseewhereyoucan
specifythefilterquery,fields,andotheroptions.Youcancheck
outtheSolrWikiformoredetailsontheoptionsandhowtheyare
used.

Ihopethatthishelps
you.

+!Thefilterqueryisan
excellentsuggestion.Youmayconsideraddingabitaboutthe
advantageofusingthefilterquerythere.–rfeakMay27’11at
14:55

Thankyou!Thefilterquery
isworkingasexpected.ButunfortunatelyIstilldontknowhowto
handlethekeywordsbecausetheyfiltertheemailsinsteadofonly
sortthem.–DanielMay27’11at16:06

Sortingbyrelevanceisdefaultbehavioronsolr/lucene.If
yourresultsareunsatisfied,trytoputthekeywordsin
quotes

//Edit:Folowingtheanswer
fromPaigeCook,usesomethinklikethat

q=”important
thing”&fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]

//2.ndupdate.Bythinking
aboutthisanswer:quotesarenotangoodidea,becauseinthis
caseyouwillonlyreceive“importantthing”mails,butno
“importanttoo”

ThePointis:whatkeywords
youareusing.Because:searchingfor—importantthing—results
inthehighestscoresfor“importantthing”mails.Butlucenedoes
notknow,howtoscore“importanttoo”or“notsoimportant,but
morethanaverage”inrelationtoyourkeywords.Anotheridea
wouldbesearchingonlyfor“important”.Butthefield-values
“importandthing”and“importandtoo”givesnearlythesamescore
values,because50%ofthesearchedkeywords(inthiskey:
“imported”)arepartofthefield-value.Soprobablyyouhaveto
changeyourkeywords.Itcouldworkafterchangeing“importendto”
into“alsoanimportantmail”,togetthebeastratioof
search-word“important”andfield-valueinordertoscorethe
shortestMail-discriptontothehighestvalue.

Thanksforyouranswer!You
pointexactlytomyproblembecausethekeywordsfilterthe
documentsinsteadofonlysortingthemallaninfluencingthe
relevancyscore.Idonotknowhowtohandlethis.–DanielMay27
’11at16:13

Wasthispostusefulto
you?


Solrchangesdocument’s
scorewhenitsrandomfieldvaluealtered

http://stackoverflow.com/questions/6254587/solr-changes-documents-score-when-its-random-field-value-altered

1downvote
favorite

Ineedtonavigateforthand
backinSolrresultssetorderedbyscoreviewingdocumentsoneby
one.Tovisualisethat,firstalistofdocumenttitlesis
presentedtouser,thenheorshecanclickoneofthetitletosee
moredetailsandthenneedstohaveanopportunitytomovetothe
nextdocumentintheoriginallistwithoutgettingbackand
clickinganothertitle.

Duringviewingdocumentsget
changed:theirdynamicfieldismodified(orcreatedisnotexists
yet)tomarkthatdocumenthasalreadybeenviewed(usedinother
search).

TheproblemIfaceisthat
whenthedocumentisalteredandre-indexedtokeepthosechanges,
sometimes(andnotalways,whichisverydisturbing)itsplacein
theresultssetforthesamequerychanges(inotherwords,it’s
scorechangesasthatdoesn’thappenwhenbrowsingresultssorted
byoneofthedocuments’fields).So,“Previous”/“Next”
navigationdoesn’tworkproperly.

I’mnotusinganycustom
weightingorboostersonfieldsforscorecalculation.Also,that
dynamicfieldchangedduringbrowsingdoesn’tparticipateinthe
queryusedtogettherecordsetbrowsed.

So,thequestionsare:can
themodificationofthedocument’sfieldnotincludedinthequery
changeitsrelevancescore?Andifitcan,thenhowcanIcontrol
that?

UPDATE

Ididsometestsandcanadd
thefollowing:

1.
Documentchangesitsplaceintheresultsetevenifnofieldis
amended–justrequestingthedocumentandre-indexingitwithout
anychangestoitsfieldsmakesittakeanotherplacenexttimethe
samequeryoverthesameindexisexecuted.

2.
Thathappenseveniftheresultsetissortedexplicitly
(“first_nameDESC”),soscore(whichdependsontheupdatedate)is
notinvolved.Thedocumentstaysthesame,itsfieldresultsetis
sortedbyisthesame,yetitspositionchanges.

Stillhavenoideahowto
avoidthat.

2Answers

InSolr,ifyourfieldis
“indexed”,itwillhaveaneffectontherelevancyranking
(“stored”fieldsshowupinsearchresultsbutarenotnecessarily
searchable).Ifthefieldsinquestionaren’tmarkedasindexed
thenyouaregoodtogo.Notethat“indexed”and“stored”arenot
necessarilythesame,henceyouconfusionaboutresultslists
changingeventhoughnotallfieldsareshown(afieldcanbe
“indexed”andnot“stored”aswell).

InthiscaseIthinkyou
wantyour“viewed”fieldtobe“stored”butnot“indexed”.Ifyou
reallywanttocontrolthequery,youcanusecopyFieldtocopythe
relevantresultsintoasinglesearchablefield.Youcanalsoboost
termsordocumentssothatcertainfieldsare“lessimportant”to
thesearchquery.

Ifyouwanttoseehowthe
relevancyrankingsarecalculated,youcanadd“debugQuery=on”to
theendofyourSolrQuery(seetheRelevancyFAQformore
info).

However,allthatbeing
said,Iwouldrecommendyoucacheyoursearchresultquery(at
leastforthefirstpageforyourresults),sinceyouwillalways
haveresultschanging(documentsadded,removedbyotherusers,
etc).YourbestbetistodesignaUIthatanticipatesthis,orat
leastbatchesauser’squery.

Thanks,forsomereasonI
wassurechangestofieldsnotparticipatinginthequerydon’t
affectthecalculatedscore.Inmycaseitisnecessarytohave
thisfieldindexedasthereisanotherquerywhereIneedtofilter
documentssearchingonlyviewedoronlynotviewedbefore.Caching
isalsonotsuitableasusersissupposedtonavigatethroughthe
wholeresultset,notonlythroughthepage(well,cachingstill
possibleandtobehonestbearableintermsofresourcesbutjust
notelegant).I’lltrytoboostthefieldbeingsearchedandtell
ifthatworks.–YuriyJun7’11at7:45

Justnoticedthatitalso
happenswhentheresultsaresortedbyotherfieldthanscore.How
that’spossible?Ithoughtiforderingisspecifiedandscoreis
notintheclauseexplicitly(say,orderingislike“first_name
DESC”),itdoesn’tinfluencetheordering.However,itseemsit
does.HowcanIgetridofthat?–YuriyJun8’11at
14:11

Okay,lookslikeboosting
works,buthasnoeffect.IfIboostthefieldIamsearchingin,
allthematchesareboostedequallyandstilltherecently
re-indexeddocumentsgetsomedeltaintheirrelevancewhichmakes
difference.Thereshouldbeawaytoexcludethedateoflast
updatefromtheorderingcompletelybutIcan’tfindityet…–
YuriyJun8’11at14:50

feedback

I’ve
foundthesolutionwhichdoesn’teliminatetheproblemcompletely
butmakesitmuchlesslikelytohappen.

Sotheproblemhappenswhen
thedocumentsaresortedbysomefieldandthereisanumberof
themwiththesamevalueinthisfield(e.g.resultsetissorted
byfirstname,andthereare100entriesfor“John”).

Thisiswhentheindexed
timegetsinvolved–apparentlySolrusesittosortthedocuments
whentheirmainsortingfieldsareidentical.Tomakethiscase
muchlessprobable,youneedtoaddmoresortingfields,e.g.
“first_namedesc”shouldbecome“first_namedesc,last_namedesc,
register_dateasc”.

Also,addingdocument’s
uniqueidasthelastsortingfieldshouldremovetheproblem
completely(thesetofsortingfieldswillneverbeidenticalfor
anytwodocumentsintheindex).

share|improvethis
answer

RelevanceCustomization

http://lucene.472066.n3.nabble.com/Relevance-Customization-td501310.html

Hiall.

Iwanttoknowifits
possibletocustomizethesolrrelevance,somehing

likethis:

1–Icreateastaticscore
foreachdocumentandindexit.

2–Ichangetherelevance
toScore(Solr)+Score(Static)wherethesolrscoreisequalto30%
ofthetotalscore.Mixingthetwoscoresintoonlyone.

Thisisdefferentofsorting
byminestaticsocreandafterbysolrscorebecauseIdon’twant
tokillsolrscore,justgiveitalittleless
importance.

Thereisawaytodo
this?

Thank’s

Re:RelevanceCustomization

Itcanbedonewith
somethinglikeq=yourQuery_val_:yourStaticScoreField

http://wiki.apache.org/solr/FunctionQuery#fieldvalue

Butthisaddssolrscore
withstaticscore.Iamnotsurehowtoget30%ofsolrscore.May
besomethinglike?

q=yourQuery^0.3_val_:yourStaticScoreField^0.7

Modify
SOLRscoring

Hieverybody,

I’musingSOLRwithaschema
(forexample)likethis:parutiondate,date,
indexed,notstored

fulltext,stemmed,indexed,
notstored

Iknowit’spossibleto
orderbyafieldormore,butIwanttoorderbyscoreandmodify
the“scrore”"formula.I’llwantkeeptheSOLR
scorebutaddanewparameterintheformulatoboostthescoreof
themostrecentdocument.

Whatisthebestwaytodo
this?

Thanks.

Excuseformy
english.

RE:modifySOLRscoring

Ibelieveyoucanusea
functionquerytodothis:

http://wiki.apache.org/solr/FunctionQuery

ifyouembedthefollowing
inyourquery,youshouldgetaboostformorerecentdate
values:

_val_:”ord(dateField)”

Where“dateField”isthe
fieldnameofthedateyouwanttouse.

Re:modifySOLRscoring

http://lucene.472066.n3.nabble.com/modify-SOLR-scoring-td497348.html

Iaminterestedinavery
similartopiclikeyours.Iwanttomodifythefieldnamed“score”
andthedocumentboostbutnotreindextheallfieldssinceitwouldtaketo
muchpower.

Pleaseletmeknowifyou
findasolutiontothis.

Kindly


Changeorderbefore
returningdata

http://stackoverflow.com/questions/4965172/change-order-before-returning-data

Isthereanywaytochange
orderofresultinSOLR.E.gwhenIqueryinSOLRiwillget1000
recordswithhighestscore,theninthose1000recordsIwilluse
myown
functiontochangeorderagain
andjustget10recordsof
thoserecords.Icanget1000recordsandprocessbyphporjava,
butIhavetotransfer1000recordsfromSOLRservertowebserver
andIdontwantthat,Ijustwanttoget10recordsafterchanging
orderandusepaging.IsSOLRsupportthiskindofcustom
function?

Answers

Ifyoufunctioncanbe
appliedwhentherecordsareinitiallyindexed,youcandoitthere
andaddtheresultasavalueontherecord.Thensorttheresult
setbytheprecalculated
value
.Ifnot,ihaven’tworkedwithitdirectly,butthis
threadseemstohavetheansweryou’relookingfor

HiMycaseisveryspecial,
Ihadpreindexscoreindatabasealready.Letmegiveoneexample,
Ihaveshoppingsite,whenIsearchforTVLCD32inch,Igotmany
resultfromsomedifferentbranchlikeLG,Toshiba…andmay
resultforLGappearconsequentlyIwanttoseparateite.gIdont
want3resultsforLGsitnexttogether,CurrentlyIget1000best
records(baseonscore)andchangetheorderagainusingPHP,nowI
wanttomovethisjobtoSOLR(Idontwanttransferdatatomuch
betweenSOLRandWebserver,Ijustneed10recordstodisplay)–
user612433Feb11’11at3:45

Yesyoucancreateacolumn
withtheinfoyouwanttobetakenintoaccountintothe
score.

Forex,fora“popularity”
column,yourquerywouldbe:

yourquery&&
_val_:”popularity”^0.7

0.7beingtheboostfactor
intothefinalscore.youcanalsofiltertheresultsettoget
le***esults:

yourquery&&
fq=popularity:[10TO*]

limitingthetotalnumberofdocuments
matched

http://search-lucene.com/m/4AHNF17wIJW1/

Re:limitingthetotal
numberofdocumentsmatched

YonikSeeley2010-07-17,
00:55

OnWed,Jul14,2010at5:46
PM,Paul<[EMAILPROTECTED]>
wrote:

Ithoughtofanother
waytodoit,butIstillhaveonethingIdon’tknowhowtodo.I
coulddothesearchwithoutsortingforthe50thpage,thenlookat
therelevancyscoreonthefirstitemonthatpage,thenrepeatthe
search,butaddscore>thatrelevancyasa
parameter.Isitpossibletodoasearchwith“score:[5to*]“?
Itdidn’tworkinmyfirst
attempt.

frangecouldpossiblehelp(rangequeryonanarbitrary
function).

http://www.lucidimagination.com/blog/tag/frange/

Soperhapssomething
like

q={!frange
l=0.85}query($qq)

qq=

where0.85isthelower
boundyouwantforscoresandqqisthenormalrelevancy
query

-Yonik

http://www.lucidimagination.com

OnWed,Jul14,2010at5:34
PM,Paul<[EMAILPROTECTED]>
wrote:

Iwashopingforaway
todothispurelybyconfigurationandmakingthecorrectGET
requests,butifthereisawaytodoitbycreatingacustom
RequestHandler,IsupposeIcouldplungeintothat.Wouldthat
yieldthebestresults,andwouldthatbeparticularly
difficult?

>>OnWed,Jul14,2010at
4:37PM,Nagelberg,Kallin

Soyouwanttotakethetop
1000sortedbyscore,thensortthosebyanotherfield.It’sa
strangecase,andIcan’tthinkofacleanwaytoaccomplishit.
Youcoulddoitintwoqueries,wherethefirstisbyscoreandyou
onlyrequestyourIDstokeepitsnappy,thendoasecondquery
againsttheIDsandsortbyyourotherfield.1000seemslikealot
forthatapproach,butwhoknowsuntilyoutryitonyour
data.

>>>-Kallin
Nagelberg

>>>Subject:
limitingthetotalnumberofdocumentsmatched

I’dliketolimitthetotal
numberofdocumentsthatarereturnedforasearch,particularly
whenthesortorderisnotbasedonrelevancy.Inotherwords,if
theusersearchesforaverycommonterm,theymightgettensof
thousandsofhits,andiftheysortby“title”,thenveryhigh
relevancydocumentswillbeinterspersedwithverylowrelevancy
documents.I’dliketosetalimittothe1000mostrelevant
documents,thensortthosebytitle.Isthereawaytodo
this?

IguessIcouldalways
retrievethetop1000documentsandsortthemintheclient,but
thatseemsparticularlyinefficient.Ican’tfindanyotherwayto
dothis,though.