【提醒】
这个page是个人汇总了maillist、自己在搜索平台化、通用化过程中遇到的种种需求,为了避开必要的“敬业竞争禁止等”,特地从外网搜罗并汇总代表性的需求。构成基于solr搜索“策略”参考、搜索应用查询的方案参考,但是,性能问题特别是高级用法,在大数据量时,务必压测,做到心里有底。
这里面给出的方法绝大部分基于solr接口、配置。不针对深入定制的详细说明。针对深入定制的经验,这里找不到答案,有兴趣私下交流。
整个汇总抛砖引入,各个点没有做系统、全面的论证,内容基本来自网络,总体方向和大点没有问题。如果发现细处不对,也请指出。谢谢!
目录
5Customsort(score+customvalue)
4
7howcanIlimitbyscorebeforesortingina
solrquery.
6
10Solr:HowcanIgetalldocumentsorderedby
scorewithalistofkeywords?.
11
11Solrchangesdocument’sscorewhenitsrandom
fieldvaluealtered.
13
14Changeorderbefore
returningdata.
16
15limitingthe
totalnumberofdocumentsmatched.
17
3.4.0
得分的问题
(7)
得分因子是可以调整的,但是得分因子的增加、得分公式的扩展,无法直接从solr配置插入。—-但是,可以扩展lucene的代码或者参数
spanquery,重新一个query,插入solr,这样工作量稍大.另外,社区提供了bm25、pagerank等排序batch,对lucene
有所以了解后,就可以直接引用了。
(16)
在排序上,对与去重或者对应基于时间动态性上,还没有现成的支持。去重是指排序的前几条结果,可能某个域值完全相同了,或者某几个域值完全相同,导致看起来,靠前的结果带有一些关联字段的“聚集性”,对有些应用来说,并不是最好的。
在时间因素上动态性,也没有直接支持,也只能靠间接的按时间排序来实现。
这个问题其实不是lucene、solr要关注的吧,应该是应用的特殊性导致的吧。
配置方法
全局配置schema.xml
Similarity
A(global)declarationcanbeusedtospecifya
customSimilarityimplementationthatyouwantSolrtousewhen
dealingwithyourindex.ASimilaritycanbespecifiedeitherby
referringdirectlytothenameofaclasswithano-arg
constructor…
<similarity
class=”org.apache.lucene.search.similarities.DefaultSimilarity”/>
…orbyreferencinga
SimilarityFactoryimplementation,whichmaytake
optionalinitparams….
<similarity
class=”solr.DFRSimilarityFactory”>
<str
name=”basicModel”>P</str>
<str
name=”afterEffect”>L</str>
<str
name=”normalization”>H2</str>
<float
name=”c”>7</float>
</similarity>
BeginingwithSolr4.0,Similarity
factoriessuchasSchemaSimilarityFactory
canalsosupportspecifyingspecific
Similarityimplementationsonindividualfieldtypes…
<types>
<fieldTypename=”text_dfr”
class=”solr.TextField”>
<analyzer
class=”org.apache.lucene.analysis.standard.StandardAnalyzer”/>
<similarity
class=”solr.DFRSimilarityFactory”>
<str
name=”basicModel”>I(F)</str>
<str
name=”afterEffect”>B</str>
<str
name=”normalization”>H2</str>
</similarity>
</fieldType>
<fieldTypename=”text_ib”
class=”solr.TextField”>
<analyzer
class=”org.apache.lucene.analysis.standard.StandardAnalyzer”/>
<similarity
class=”solr.IBSimilarityFactory”>
<str
name=”distribution”>SPL</str>
<str
name=”lambda”>DF</str>
<str
name=”normalization”>H2</str>
</similarity>
</fieldType>
…
</types>
<similarity
class=”solr.SchemaSimilarityFactory”/>
Ifno(global)isconfiguredintheschema.xmlfile,
animplicitinstanceofDefaultSimilarityFactory
isused.
问题和需求
By
DefaultComputerValue
ByCustomScore,By
DefaultComputerValue
CustomScore*fa+
DefaultComputerValue*fb
Doc11010010*0.8+
100*0.2=28
Doc2199
1*0.8+99*0.2=20.6
Doc3398
3*0.8+98*0.2=22
Doc42050
20*0.8+50*0.2=36
Solr3.4.0
得分代码分析
abstractclass
SimilarityFactory
成员变量publicabstract
SimilaritygetSimilarity();
Payload问题
http://wiki.apache.org/lucene-java/Payloads
Scoringpayloadsinvolves
overridingtheSimilarity.scorePayload()method.Forexample,if
onehasimplementedstoringaFloatpayload,itcouldbeusedfor
scoringinthefollowingway:
<span lang="EN-US"><span> </span>public float scorePayload(byte [] payload, int offset, int length) {</span>
<span lang="EN-US"><span> </span>assert length == 4;</span>
<span lang="EN-US"><span> </span>int accum = ((payload[0+offset]&0xff)) |</span>
<span lang="EN-US"><span> </span>((payload[1+offset]&0xff)<<8) |</span>
<span lang="EN-US"><span> </span>((payload[2+offset]&0xff)<<16)<span> </span>|</span>
<span lang="EN-US"><span> </span>((payload[3+offset]&0xff)<<24);</span>
<span lang="EN-US"><span> </span>return Float.intBitsToFloat(accum);</span>
<span lang="EN-US"><span> </span>}</span>
Don’tforgettoactivate
yourSimilarityimplementationusingIndexSearcher.setSimilarity().
Also,notethateventhennotallquerieswillactuallymakeuseof
yourmethod.Forexample,youwillneedtouseBoostingTermQuery
insteadofTermQuery.QueryParsercurrently(Lucene2.3.2)always
usesTermQueryandyouwillneedtoextendQueryParserand
overwritegetFieldQuery().
Note,thatisjustone
possiblewayofscoringapayload.Payloadsareapplication
specific.ForexamplepayloadTokenFiltersseethepayloadpackage
inthecontrib/Analysismodule.
Customsort(score+custom
value)
http://grokbase.com/t/lucene/solr-user/08b25j6ked/custom-sort-score-custom-value
Hi,
Iwanttoimplementacustomsortin
Solrbasedonacombinationofrelevance(Solrgivesmeityet
=>score)andacustomvalueI’vecalculated
previouslyforeachdocument.Iseetwooptions:
1.Useafunctionquery(I’musinga
DisMaxRequestHandler).
2.CreateacomponentthatsetSortSpecwithasortthathasa
custom
ComparatorSource(similartoQueryElevationComponent).
Thefirstoptionhastheproblem:
Whiletherelevancevaluechangesfor
everyquery,mycustomvalueisconstantforeachdoc.Itimplies
queries
withdocumentsthathavehighrelevancearelessaffectedwithmy
custom
value.Ontheotherhand,querieswithlowrelevanceareaffecteda
lotwithmycustomvalue.Canitbeproportionalwithafunction
query?(i.e.docswithlowrelevancearelessaffectedbymycustom
value).
Thesecondoptionhastheproblem:
Solrscoreisn’tnormalized.Ineeditnormalizedinordertoapply
mycustomvalueinthesortValuefunctionin
ScoreDocComparator.Whatdoyouthink?What’sthebestoptionin
thatcase?Anotheroption?
Thankyouinadvance,
George
BoostQParserPlugin
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/search/BoostQParserPlugin.html
org.apache.solr.search
Class
BoostQParserPlugin
http://stackoverflow.com/questions/3035831/solr-lucene-scorer
Scorerarepartsoflucene
Queriesviathe‘weight’querymethod.
Inshort,theframework
callsQuery.weight(..).scorer(..).Havealookat
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html
http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html
TouseyourownQueryclass
inSolr,you’llneedtoimplementyourownsolrQueryParserPlugin
thatusesyourownQParserthatgeneratesyourpreviously
implementedluceneQuery.YouthencanuseitinSolrspecified
here:
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
Thispartonimplementation
shouldstaysimpleasthisisjustsomeglueingcode.
Enjoyhacking
Solr!
share|improvethis
answer
answeredJun14’10at
10:33
Youcanoverridethelogic
solrscoreruses.SolrusesDefaultSimilarityclassforscoring.1)
makeaclassextendingDefaultSimilarity.2)overridethefunctions
tf(),idf()etcaccordingtoyourneed.
public class
CustomSimilarity extends DefaultSimilarity {
public CustomSimilarity()
{
super();
}
public float tf(int
freq) {
//your
code
return (float)
1.0;
}
public float idf(int
docFreq, int numDocs) {
//your code
return (float)
1.0;
}
}
3)Aftercreatingaclass
compileandmakeajar.4)putthejarinlibfolderof
correspondingindexorcore.5)Changetheschema.xmlof
correspondingindex.CustomSimilarity”/>
Youcancheckoutvarious
factorsaffectingscorehere
Foryourrequirementyoucan
createbucketsifyourscoreisinspecificrange.Alsoreadabout
fieldboosting,documentboostingetc.Thatmightbehelpfulin
yourcase.
http://stackoverflow.com/questions/11748487/how-can-i-filter-solr-results-by-custom-score
HowcanIfilterSOLRresultsbycustomscore
I’musingsolrfunction
queriestogeneratemyowncustomscore.Iachievethisusing
somethingalongtheselines:
<code><span lang="EN-US">q=_val_:"my_custom_function()"</span></code>
Thispopulatesthescore
fieldasexpected,butitalsoincludesdocumentsthatscore0.I
needawaytofiltertheresultssothatscoresbelowzeroarenot
included.
IrealizethatI’musing
scoreinanon-standardwayandthatnormallythescorethat
lucene/solrproduceisnotabsolute.However,producingmyown
scoreworksreallywellformyneeds.
I’vetriedusing{!frange
l=0}butthiscausesthescoreforalldocumentstobe
“1.0″.
Isuspectpseudo-fields
couldbeused,butsincesolr4isstillalpha,I’mlookingfora
waytodoitusingSolr3.1.
howcanIlimitbyscore
beforesortinginasolrquery
Iamsearching“product
documents”.Inotherwords,mysolrdocumentsareproductrecords.
Iwanttogetsaythetop50matchingproductsforaquery.ThenI
wanttobeabletosortthetop50scoringdocumentsbynameor
price.I’mnotseeingmuchonhowtodothis,sincesortingby
score,thenbynameorpricewon’treallyhelp,sincescoresare
floats.
Iwouldn’tmindifIcould
dosomethinglikemapthescorestoranges(likeascoreof
8.0-8.99wouldgointhe8bucketscore),thensortbyrange,then
bynames,butsincethereisbasicallynonormalizationtoscoring,
thiswouldstillmakethingsabitharder.
Tl;drHowdoIexcludelow
scoringdocumentsfromthesolrresultsetbeforesorting?solr
scoring
share|improvethis
question
askedDec7’10at
22:21
3
Answers
Youcanusefrangeto
achievethis,aslongasyoudon’twanttosortonscore(inwhich
caseIguessyoucouldjustdothefilteringontheclientside).
Yourquerywouldbesomethingalongthelinesof:
q={!frange
l=5}query($qq)&qq=[awesome
product]&sort=priceasc
Setthelargumentinthe
q-frange-parametertothelowerboundyouwanttofilterscoreon,
andreplacetheqqparameterwithyouruserquery.
answeredDec8’10at
10:23
KarlJohansson
1,046310
thanks,sinceIcangeta
reasonablefrangefromthefirsttimetheresultsaredisplayed
sortedbyscorealone,thisworksgreat!–ZakDec9’10at
18:40
Idon’tthinkyoucansimply
excludelowscoringdocumentsfromthesolrresultsetbefore
sorting
becausetherelevancescore
isonlymeaningfulforagivencombinationofsearchqueryand
resultingdocumentlist.I.e.scoresareonlymeaningfulwithina
givensearchandyoucannotsetsomethresholdforall
searches.
IfyouwereusingJava(or
PHP)youcouldgetthetop50documentsandthenre-sortthislist
inyourprogramminglanguagebutIdon’tthinkyoucandoitwith
justSOLR.
Anyway,Iwouldrecommend
youdon’tgodownthisrouteofre-sortingtheresultsfromSOLR,
asitwillsimplyconfusetheuser.Peopleexpectsearchresultsto
belikeGoogle(andmostothersearchengines),whereresultscome
backinsomeformofTFIDFranking.
Havingsaidthat,youcould
usesomeothercriteriatoseparatedocumentswiththesame
relevancescoresbyaddinganindex-timeboostfactorbasedona
pricerangescale.
I’dsuggestyouuseSOLRto
itsstrengthsandusefacets.Provideapricerangefacetonthe
left(likeEbay,Amazon,etal.)and/oraproductcategoryfacet,
etc.Alsoprovidea“sort”widgettoallowtheresultstobesorted
byproductname,iftheuserwantsit.
[EDIT]thisquestionmight
alsobeuseful:
Digg-likesearchresult
rankingwithLucene/Solr?
AsobservedbyKarl
Johansson,youcoulddothefilteringontheclientside:loadthe
first50rowsoftheresponse(sortedbyscoredesc)andthen
manipulatetheminJSforexample.
ThejQueryDataTablesplugin
worksfantasticallyforthatkindofthing:sorting,sortingon
multiplecolumns,dynamicfiltering,etc.—andwithonly50rows
itwouldbeveryfasttoo,sothatuserscan“play”withthe
sortingandfilteringuntiltheyfindwhattheywant.
Scorefilter
http://lucene.472066.n3.nabble.com/score-filter-td493438.html
Hello,Isthereawaytosetascorefilter?Itried |
What’sthemotivationfor
wantingtodothis?ThereasonIask,is
scoreisarelativethingdeterminedbyLucenebasedonyourindex
statistics.
Itisonlymeaningfulforcomparingtheresultsofaspecificquery
withaspecificinstanceoftheindex.Inotherwords,it
isn’tusefultofilteronb/cthereisnowayofknowingwhata
goodcutoffvaluewouldbe.So,youwon’tbeable
todoscore:[1.2TO*]becausescoreisa
notanactualField.
Thatbeingsaid,you
probablycouldimplementaHitCollectorattheLuceneleveland
somehowhookitintoSolrtodowhatyouwant.Or,ofcourse,just
stopprocessingtheresultsinyourappafteryouseeascorebelow
acertainvalue.Naturally,thisstill
meansyouhavetoretrievetheresults.
Re:scorefilter
Inmycase,forexample
searchingabook.Someofthereturneddocumentsarewithhigh
relevance(score>3),butsomeofdocumentwithlow
score(<0.01)areuseless.
Withouta“scorefilter”,I
havetogothrougheachdocumenttofindoutthenumberof
documentsI’minterested(score>nnn).Thiscauses
someproblemforpagination.ForexampleifIonly
needtodisplaythefirst10recordsIneedtoretrieveall1000
documentstofigureoutthenumberofmeaningfuldocumentswhich
havescore>nnn.
Thx,
Kevin
What’sthemotivationfor
wantingtodothis?ThereasonIask,is
scoreisarelativethingdeterminedbyLucenebasedonyourindex
statistics.
Itisonlymeaningfulforcomparingtheresultsofaspecificquery
withaspecificinstanceoftheindex.Inotherwords,it
isn’tusefultofilteronb/cthereisnowayofknowingwhata
goodcutoffvaluewouldbe.So,youwon’tbeable
todoscore:[1.2TO*]becausescoreisanotanactual
Field.
Thatbeingsaid,you
probablycouldimplementaHitCollectorattheLuceneleveland
somehowhookitintoSolrtodowhatyouwant.Or,ofcourse,just
stopprocessingtheresultsinyourappafteryouseeascorebelow
acertainvalue.Naturally,thisstill
meansyouhavetoretrievetheresults.
-Grant
Re:scorefilter
Atwhatpointdoyoudraw
theline?
0.01istoolow,butwhatabout0.5or0.3?Infact,theremaybe
querieswhere0.01isrelevant.
Relevanceisatrickything
andputtinginarbitrarycutoffsisusuallynotagoodthing.An
alternativemightbetoinsteadlookatthedifferencebetween
scoresandseeifthegapislargerthansomedelta,buteventhat
issubjecttothevagariesofscoring.
Whatkindofrelevance
testinghaveyoudonesofartocomeupwith
thosevalues?Seealso
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
Re:scorefilter
Justdidsomeresearch.It
seemsthatit’sdoablewithadditionalcodeaddedtoSolrbutnot
outofbox.Thankyou,Grant.
Atwhatpointdoyoudraw
theline?
0.01istoolow,butwhatabout0.5or0.3?Infact,theremaybe
querieswhere0.01isrelevant.
Relevanceisatrickything
andputtinginarbitrarycutoffsisusuallynotagoodthing.An
alternativemightbetoinsteadlookatthedifferencebetween
scoresandseeifthegapislargerthansomedelta,buteventhat
issubjecttothevagariesofscoring.
Whatkindofrelevance
testinghaveyoudonesofartocomeupwiththose
values?See
also
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
Re:scorefilter
Don’tbotherdoingthis.It
doesn’twork.
Thisseemslikeagoodidea,
somethingthatwouldbeusefulforalmosteveryLucene
installation,butitisn’tinLucenebecauseitdoesnotworkin
therealworld.
Afewproblems:
*Someuserswantevery
matchanddon’tcarehowmanypagesofresultstheylook
at.
*Someusersareverybadat
creatingqueriesthatmatchtheirinformationneeds.Othersare
merelybad,notverybad.Thegoodmatchesfortheirqueryareon
top,butthegoodmatchesfor
theirinformationneedare
onthethirdpage.
*Misspellingscanputthe
rightmatch(partialmatch)atthebottom.Ididthisyesterdayat
mylibrarysite,typeing“KatherineKerr”insteadofthecorrect
“KatharineKerr”.
Theirsearchengineshowed
nomatches(grrr),soIhadtosearchagainwith“Kerr”.
*Mostusersdonotknowhow
torepairtheirqueries,likeIdidwith“KatherineKerr”,changing
itto“Kerr”.Eveniftheydo,youshouldn’tmakethem.Justshow
theweaklyrelevantresults.
*Documentshaveerrors,
justlikequeries.Ifindbaddataonoursiteaboutonceamonth,
andwehaveprofessionaleditors.Westillhaven’tfixedourentry
for“BettyPage”toread“BettiePage”.
*Peoplemayusenon-title
wordsinthequery,likesearchingfor“batman”whentheywant“The
DarkKnight”.
So,don’tdothis.Ifyou
areforcedtodoit,makesurethatyoumeasureyoursearchquality
beforeandafteritisimplemented,becauseitwillgetworse.Then
youcanstopdoingit.
wunder
Re:scorefilter
+1.Ofcourseitis
doable,butthatdoesn’tmeanyoushould,whichiswhatIwas
tryingtosaybefore,(butwastypingonmyiPodsoitwasn’tfast)
andwhichWalterhasdoneso.Itisentirely
conceivabletomethatsomeonecouldsearchforaverycommonword
suchthatthescoreofallrelevant(andthus,“good”)documents
arebelowyourpredefinedthreshold.
Atanyrate,proceedatyour
ownperil.
Toimplementit,lookintotheSearchComponent
functionality.
Re:scorefilter
HelloGrant,
Ineedtoframeaquerythat
isacombinationoftwoquerypartsandIusea‘function’queryto
preparethesame.Somethinglike:
q={!type=funcq.op=AND
df=text}product(query($uq,0.0),query($cq,0.1))
where$uqand$cqaretwo
queries.
Now,Iwantasearchresult
returnedonlyifIgetahiton$uq.So,Ispecifydefaultvalueof
$uqqueryas0.0inorderforthefinalscoretobezeroincases
where$uqdoesn’trecordahit.Eventhough,thescoringworksas
expected(i.e,documentthatdon’tmatch$uqhaveascoreofzero),
allthedocumentsarereturnedassearchresults.Isthereawayto
filtersearchresultsthathaveascoreofzero?
Thanksforyour
help,
Debdoot
Re:scorefilter
:Ineedtoframeaquery
thatisacombinationoftwoquerypartsandIusea‘function’
querytopreparethesame.Somethinglike:
:q={!type=funcq.op=AND
df=text}product(query($uq,0.0),query($cq,0.1))
:where$uqand$cqaretwo
queries.
:
:Now,Iwantasearch
resultreturnedonlyifIgetahiton$uq.So,Ispecifydefault
valueof$uqqueryas0.0inorderforthefinalscoretobezero
incaseswhere$uqdoesn’trecordahit.Eventhough,thescoring
worksasexpected(i.e,documentthatdon’tmatch$uqhaveascore
ofzero),allthedocumentsare
returnedassearchresults.Isthereawaytofiltersearchresults
thathaveascoreofzero?
a)youcouldwrapyourquery
in{!frange}..butthatwillmakeeverything
thatdoeshavea
value>0.0getthesamefinalscore
b)youcouldusean
fq={!frange}thatrefersbacktoyouroriginal$q
c)youcouldjustuseanfq
thatrefersdirectlytoyour$uqsincethat’s
whatyousayyouactaully
wanttofilteroninthefirstplace..
uq=…
cq=…
q={!type=funcq.op=AND
df=text}product(query($uq,0.0),query($cq,0.1))
fq={!v=uq}
Boostscoreforearlymatches
Solr–Howtoboostscoreforearlymatches?
upvote1downvote
favorite
HowcanIboostthescore
fordocumentsinwhichmyquerymatchesaparticularfieldearlier.
Forexample,searchingfor“superman”shouldgive“superman
returns”ahigherscorethan“thereismysuperman”.Isthis
possible?
Uh,storethefirstfew
wordsexplicitlyinanotherfield,andboostmatchesonthisfield.
–aitchnyuAug22at9:45
Theproblemthereisthat
thesizeofthequerycanvaryfromsay3characterstosay100
characters,andsodetermininghowmanywords/charstoindex
separatelycanbedifficult.–techfoobarAug22at9:49
Secondly,supposeiindex
thefirst25characters,andonerecordhas“mysupermanblah..”
andanotherrecordhas“supermanreturnsblah..”–bothwillmatch
thequery“superman”andbothwillbeboostedwheniboostthis
secondaryfield.–techfoobarAug22at9:50
2Answers
Thankyoufortheanswer.
Butisolvedittodaybyusingtheapproachi’veoutlinedinmy
answer.–techfoobarAug22at18:33
Butthisisnotgoingto
workifthewordsdonotoccurattheverystart.Maywanttocheck
outpayloadsaswellwhereucanaddindextimesuggestionsaslaid
downinthesecondoption.–JayendraAug22at18:35
Willcheckthatoutaswell.
However,thecurrentsolutioncanbemadetoworktoalargeextent
byfinetuningthepsparametertomakeitmorelenient.I
currentlyuse2(distbetween2termsinthepf)anditseemstobe
workingquitewellformymediumsizeddataset(1000sofrecords,
greatlyvaryingincontent).Willcheckoutyourpointandletyou
knowifithelped.–techfoobarAug22at18:38
up
vote0downvoteaccepted
SolveditmyselfafterreadingaLOTaboutthisonline.What
specificallyhelpedmewasareplyonnabblewhichgoeslike(I
useddismax,soexplainingthathere):
Createaseparatefieldnamedsay‘nameString’whichstoresthe
valueas“START“
Changethesearchqueryto“START“
AddthenewfieldnameStringasoneofthefieldstolookininthe
queryfieldsparam(qf)
Whilesearchingusetheparameterpf(phrasefield)asthenew
fieldnameStringwithaphraseslopof1or2(lowervalueswould
meanstrictersearching)
Yourfinalqueryparamswill
besomethinglike:
q=_START_
defType=dismax
qf=name
nameString
pf=nameString
ps=2
Solr:HowcanIgetall
documentsorderedbyscorewithalistof
keywords?
IhaveaSolr3.1database
containingEmailswithtwofields:
datetime
text
ForthequeryIhavetwo
parameters:
dateoftoday
keywordarray(“importantthing”,“importanttoo”,“notso
important,butmorethanaverage”)
Isitpossibletocreatea
queryto
1.
getALLdocumentsofthisdayAND
2.
sortthembyrelevancybyorderingthemsothattheemailwith
containsmostofmykeywords(importantthings)scores
best?
Thepartwiththedateis
notverycomplicated:
fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]
Iknowthatyoucanboost
thekeywordsthisway:
q=text:”firstkeyword”^5OR
text:”secondone”^2ORtext:”minusscoring”^0.5OR
text:”*”
ButhowdoIonlyusethe
keywordstosortthislistandgetALLentriesinsteadofdoinga
realyqueryandgetonlyafewentriesback?
Thanksforhelp!
2Answers
Youneedtospecifyyour
termsinthemainqueryandthenchangeyourdatequerytobea
filterqueryontheseresultsbyaddingthefollowing.
fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]
Soyoushouldhavesomething
likethis:
q=&fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]
Edit:Alittlemoreabout
filterqueries(assuggestedbyrfreak).
FromSolrWiki–FilterQuery
Guidance–“Now,whatisafilterquery?Itissimplyapartofa
querythatisfactoredoutforspecialtreatment.Thisisachieved
inSolrbyspecifyingitusingthefq(filterquery)parameter
insteadoftheq(mainquery)parameter.Thesameresultcouldbe
achievedleavingthatquerypartinthemainquery.Thedifference
willbeinqueryefficiency.That’sbecausetheresultofafilter
queryiscachedandthenusedtofilteraprimaryqueryresult
usingsetintersection.”
Theseshouldbesortedby
relevancyscorealready,thatisjustthedefaultbehaviorofSolr.
Youcanseethescorebyaddingthatfield.
fl=*,score
IfyouusetheFull
InterfaceforMakeAQueryontheAdminInterfaceonyourSolr
installationathttp:admin/form.jspyouwillseewhereyoucan
specifythefilterquery,fields,andotheroptions.Youcancheck
outtheSolrWikiformoredetailsontheoptionsandhowtheyare
used.
Ihopethatthishelps
you.
+!Thefilterqueryisan
excellentsuggestion.Youmayconsideraddingabitaboutthe
advantageofusingthefilterquerythere.–rfeakMay27’11at
14:55
Thankyou!Thefilterquery
isworkingasexpected.ButunfortunatelyIstilldontknowhowto
handlethekeywordsbecausetheyfiltertheemailsinsteadofonly
sortthem.–DanielMay27’11at16:06
Sortingbyrelevanceisdefaultbehavioronsolr/lucene.If
yourresultsareunsatisfied,trytoputthekeywordsin
quotes
//Edit:Folowingtheanswer
fromPaigeCook,usesomethinklikethat
q=”important
thing”&fq=datetime[YY-MM-DDT00:00:00.000ZTO
YY-MM-DDT23:59:59.999Z]
//2.ndupdate.Bythinking
aboutthisanswer:quotesarenotangoodidea,becauseinthis
caseyouwillonlyreceive“importantthing”mails,butno
“importanttoo”
ThePointis:whatkeywords
youareusing.Because:searchingfor—importantthing—results
inthehighestscoresfor“importantthing”mails.Butlucenedoes
notknow,howtoscore“importanttoo”or“notsoimportant,but
morethanaverage”inrelationtoyourkeywords.Anotheridea
wouldbesearchingonlyfor“important”.Butthefield-values
“importandthing”and“importandtoo”givesnearlythesamescore
values,because50%ofthesearchedkeywords(inthiskey:
“imported”)arepartofthefield-value.Soprobablyyouhaveto
changeyourkeywords.Itcouldworkafterchangeing“importendto”
into“alsoanimportantmail”,togetthebeastratioof
search-word“important”andfield-valueinordertoscorethe
shortestMail-discriptontothehighestvalue.
Thanksforyouranswer!You
pointexactlytomyproblembecausethekeywordsfilterthe
documentsinsteadofonlysortingthemallaninfluencingthe
relevancyscore.Idonotknowhowtohandlethis.–DanielMay27
’11at16:13
Wasthispostusefulto
you?
Solrchangesdocument’s
scorewhenitsrandomfieldvaluealtered
1downvote
favorite
Ineedtonavigateforthand
backinSolrresultssetorderedbyscoreviewingdocumentsoneby
one.Tovisualisethat,firstalistofdocumenttitlesis
presentedtouser,thenheorshecanclickoneofthetitletosee
moredetailsandthenneedstohaveanopportunitytomovetothe
nextdocumentintheoriginallistwithoutgettingbackand
clickinganothertitle.
Duringviewingdocumentsget
changed:theirdynamicfieldismodified(orcreatedisnotexists
yet)tomarkthatdocumenthasalreadybeenviewed(usedinother
search).
TheproblemIfaceisthat
whenthedocumentisalteredandre-indexedtokeepthosechanges,
sometimes(andnotalways,whichisverydisturbing)itsplacein
theresultssetforthesamequerychanges(inotherwords,it’s
scorechangesasthatdoesn’thappenwhenbrowsingresultssorted
byoneofthedocuments’fields).So,“Previous”/“Next”
navigationdoesn’tworkproperly.
I’mnotusinganycustom
weightingorboostersonfieldsforscorecalculation.Also,that
dynamicfieldchangedduringbrowsingdoesn’tparticipateinthe
queryusedtogettherecordsetbrowsed.
So,thequestionsare:can
themodificationofthedocument’sfieldnotincludedinthequery
changeitsrelevancescore?Andifitcan,thenhowcanIcontrol
that?
UPDATE
Ididsometestsandcanadd
thefollowing:
1.
Documentchangesitsplaceintheresultsetevenifnofieldis
amended–justrequestingthedocumentandre-indexingitwithout
anychangestoitsfieldsmakesittakeanotherplacenexttimethe
samequeryoverthesameindexisexecuted.
2.
Thathappenseveniftheresultsetissortedexplicitly
(“first_nameDESC”),soscore(whichdependsontheupdatedate)is
notinvolved.Thedocumentstaysthesame,itsfieldresultsetis
sortedbyisthesame,yetitspositionchanges.
Stillhavenoideahowto
avoidthat.
2Answers
InSolr,ifyourfieldis
“indexed”,itwillhaveaneffectontherelevancyranking
(“stored”fieldsshowupinsearchresultsbutarenotnecessarily
searchable).Ifthefieldsinquestionaren’tmarkedasindexed
thenyouaregoodtogo.Notethat“indexed”and“stored”arenot
necessarilythesame,henceyouconfusionaboutresultslists
changingeventhoughnotallfieldsareshown(afieldcanbe
“indexed”andnot“stored”aswell).
InthiscaseIthinkyou
wantyour“viewed”fieldtobe“stored”butnot“indexed”.Ifyou
reallywanttocontrolthequery,youcanusecopyFieldtocopythe
relevantresultsintoasinglesearchablefield.Youcanalsoboost
termsordocumentssothatcertainfieldsare“lessimportant”to
thesearchquery.
Ifyouwanttoseehowthe
relevancyrankingsarecalculated,youcanadd“debugQuery=on”to
theendofyourSolrQuery(seetheRelevancyFAQformore
info).
However,allthatbeing
said,Iwouldrecommendyoucacheyoursearchresultquery(at
leastforthefirstpageforyourresults),sinceyouwillalways
haveresultschanging(documentsadded,removedbyotherusers,
etc).YourbestbetistodesignaUIthatanticipatesthis,orat
leastbatchesauser’squery.
Thanks,forsomereasonI
wassurechangestofieldsnotparticipatinginthequerydon’t
affectthecalculatedscore.Inmycaseitisnecessarytohave
thisfieldindexedasthereisanotherquerywhereIneedtofilter
documentssearchingonlyviewedoronlynotviewedbefore.Caching
isalsonotsuitableasusersissupposedtonavigatethroughthe
wholeresultset,notonlythroughthepage(well,cachingstill
possibleandtobehonestbearableintermsofresourcesbutjust
notelegant).I’lltrytoboostthefieldbeingsearchedandtell
ifthatworks.–YuriyJun7’11at7:45
Justnoticedthatitalso
happenswhentheresultsaresortedbyotherfieldthanscore.How
that’spossible?Ithoughtiforderingisspecifiedandscoreis
notintheclauseexplicitly(say,orderingislike“first_name
DESC”),itdoesn’tinfluencetheordering.However,itseemsit
does.HowcanIgetridofthat?–YuriyJun8’11at
14:11
Okay,lookslikeboosting
works,buthasnoeffect.IfIboostthefieldIamsearchingin,
allthematchesareboostedequallyandstilltherecently
re-indexeddocumentsgetsomedeltaintheirrelevancewhichmakes
difference.Thereshouldbeawaytoexcludethedateoflast
updatefromtheorderingcompletelybutIcan’tfindityet…–
YuriyJun8’11at14:50
feedback
I’ve
foundthesolutionwhichdoesn’teliminatetheproblemcompletely
butmakesitmuchlesslikelytohappen.
Sotheproblemhappenswhen
thedocumentsaresortedbysomefieldandthereisanumberof
themwiththesamevalueinthisfield(e.g.resultsetissorted
byfirstname,andthereare100entriesfor“John”).
Thisiswhentheindexed
timegetsinvolved–apparentlySolrusesittosortthedocuments
whentheirmainsortingfieldsareidentical.Tomakethiscase
muchlessprobable,youneedtoaddmoresortingfields,e.g.
“first_namedesc”shouldbecome“first_namedesc,last_namedesc,
register_dateasc”.
Also,addingdocument’s
uniqueidasthelastsortingfieldshouldremovetheproblem
completely(thesetofsortingfieldswillneverbeidenticalfor
anytwodocumentsintheindex).
share|improvethis
answer
RelevanceCustomization
http://lucene.472066.n3.nabble.com/Relevance-Customization-td501310.html
Hiall.
Iwanttoknowifits
possibletocustomizethesolrrelevance,somehing
likethis:
1–Icreateastaticscore
foreachdocumentandindexit.
2–Ichangetherelevance
toScore(Solr)+Score(Static)wherethesolrscoreisequalto30%
ofthetotalscore.Mixingthetwoscoresintoonlyone.
Thisisdefferentofsorting
byminestaticsocreandafterbysolrscorebecauseIdon’twant
tokillsolrscore,justgiveitalittleless
importance.
Thereisawaytodo
this?
Thank’s
Re:RelevanceCustomization
Itcanbedonewith
somethinglikeq=yourQuery_val_:yourStaticScoreField
http://wiki.apache.org/solr/FunctionQuery#fieldvalue
Butthisaddssolrscore
withstaticscore.Iamnotsurehowtoget30%ofsolrscore.May
besomethinglike?
q=yourQuery^0.3_val_:yourStaticScoreField^0.7
Modify
SOLRscoring
Hieverybody,
I’musingSOLRwithaschema
(forexample)likethis:parutiondate,date,
indexed,notstored
fulltext,stemmed,indexed,
notstored
Iknowit’spossibleto
orderbyafieldormore,butIwanttoorderbyscoreandmodify
the“scrore”"formula.I’llwantkeeptheSOLR
scorebutaddanewparameterintheformulatoboostthescoreof
themostrecentdocument.
Whatisthebestwaytodo
this?
Thanks.
Excuseformy
english.
RE:modifySOLRscoring
Ibelieveyoucanusea
functionquerytodothis:
http://wiki.apache.org/solr/FunctionQuery
ifyouembedthefollowing
inyourquery,youshouldgetaboostformorerecentdate
values:
_val_:”ord(dateField)”
Where“dateField”isthe
fieldnameofthedateyouwanttouse.
Re:modifySOLRscoring
http://lucene.472066.n3.nabble.com/modify-SOLR-scoring-td497348.html
Iaminterestedinavery
similartopiclikeyours.Iwanttomodifythefieldnamed“score”
andthedocumentboostbutnotreindextheallfieldssinceitwouldtaketo
muchpower.
Pleaseletmeknowifyou
findasolutiontothis.
Kindly
Changeorderbefore
returningdata
http://stackoverflow.com/questions/4965172/change-order-before-returning-data
Isthereanywaytochange
orderofresultinSOLR.E.gwhenIqueryinSOLRiwillget1000
recordswithhighestscore,theninthose1000recordsIwilluse
myown
functiontochangeorderagainandjustget10recordsof
thoserecords.Icanget1000recordsandprocessbyphporjava,
butIhavetotransfer1000recordsfromSOLRservertowebserver
andIdontwantthat,Ijustwanttoget10recordsafterchanging
orderandusepaging.IsSOLRsupportthiskindofcustom
function?
Answers
Ifyoufunctioncanbe
appliedwhentherecordsareinitiallyindexed,youcandoitthere
andaddtheresultasavalueontherecord.Thensorttheresult
setbytheprecalculated
value.Ifnot,ihaven’tworkedwithitdirectly,butthis
threadseemstohavetheansweryou’relookingfor
HiMycaseisveryspecial,
Ihadpreindexscoreindatabasealready.Letmegiveoneexample,
Ihaveshoppingsite,whenIsearchforTVLCD32inch,Igotmany
resultfromsomedifferentbranchlikeLG,Toshiba…andmay
resultforLGappearconsequentlyIwanttoseparateite.gIdont
want3resultsforLGsitnexttogether,CurrentlyIget1000best
records(baseonscore)andchangetheorderagainusingPHP,nowI
wanttomovethisjobtoSOLR(Idontwanttransferdatatomuch
betweenSOLRandWebserver,Ijustneed10recordstodisplay)–
user612433Feb11’11at3:45
Yesyoucancreateacolumn
withtheinfoyouwanttobetakenintoaccountintothe
score.
Forex,fora“popularity”
column,yourquerywouldbe:
yourquery&&
_val_:”popularity”^0.7
0.7beingtheboostfactor
intothefinalscore.youcanalsofiltertheresultsettoget
le***esults:
yourquery&&
fq=popularity:[10TO*]
limitingthetotalnumberofdocuments
matched
http://search-lucene.com/m/4AHNF17wIJW1/
Re:limitingthetotal
numberofdocumentsmatched
YonikSeeley2010-07-17,
00:55
OnWed,Jul14,2010at5:46
PM,Paul<[EMAILPROTECTED]>
wrote:
Ithoughtofanother
waytodoit,butIstillhaveonethingIdon’tknowhowtodo.I
coulddothesearchwithoutsortingforthe50thpage,thenlookat
therelevancyscoreonthefirstitemonthatpage,thenrepeatthe
search,butaddscore>thatrelevancyasa
parameter.Isitpossibletodoasearchwith“score:[5to*]“?
Itdidn’tworkinmyfirst
attempt.
frangecouldpossiblehelp(rangequeryonanarbitrary
function).
http://www.lucidimagination.com/blog/tag/frange/
Soperhapssomething
like
q={!frange
l=0.85}query($qq)
qq=
where0.85isthelower
boundyouwantforscoresandqqisthenormalrelevancy
query
-Yonik
http://www.lucidimagination.com
OnWed,Jul14,2010at5:34
PM,Paul<[EMAILPROTECTED]>
wrote:
Iwashopingforaway
todothispurelybyconfigurationandmakingthecorrectGET
requests,butifthereisawaytodoitbycreatingacustom
RequestHandler,IsupposeIcouldplungeintothat.Wouldthat
yieldthebestresults,andwouldthatbeparticularly
difficult?
>>OnWed,Jul14,2010at
4:37PM,Nagelberg,Kallin
Soyouwanttotakethetop
1000sortedbyscore,thensortthosebyanotherfield.It’sa
strangecase,andIcan’tthinkofacleanwaytoaccomplishit.
Youcoulddoitintwoqueries,wherethefirstisbyscoreandyou
onlyrequestyourIDstokeepitsnappy,thendoasecondquery
againsttheIDsandsortbyyourotherfield.1000seemslikealot
forthatapproach,butwhoknowsuntilyoutryitonyour
data.
>>>-Kallin
Nagelberg
>>>Subject:
limitingthetotalnumberofdocumentsmatched
I’dliketolimitthetotal
numberofdocumentsthatarereturnedforasearch,particularly
whenthesortorderisnotbasedonrelevancy.Inotherwords,if
theusersearchesforaverycommonterm,theymightgettensof
thousandsofhits,andiftheysortby“title”,thenveryhigh
relevancydocumentswillbeinterspersedwithverylowrelevancy
documents.I’dliketosetalimittothe1000mostrelevant
documents,thensortthosebytitle.Isthereawaytodo
this?
IguessIcouldalways
retrievethetop1000documentsandsortthemintheclient,but
thatseemsparticularlyinefficient.Ican’tfindanyotherwayto
dothis,though.
转载于:https://blog.51cto.com/aliapp/1325847