Sunday, April 1, 2018

FW: Score different for different documents containing same value

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 27 March 2018 07:59
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Score different for different documents containing same value

add debug=true to the query and you'll see exactly how the scores are
calculated, that should give you a clue as to what's going on.

In particular look at the parsed query and be sure that your query is parsed
as you expect. It should be given you you specify the query, but as a sanity
check.

Is your setup sharded? If so, fire the query at each replica (add
&distrib=false) and see what the scores are.

If this is a very small corpus, a few deleted documents can skew the scores.

Try turning on distributed IDF (assuming your collection is sharded).
The stats on different shards can be different on a small corpus, it's only
when you get into significant numbers of docs that the stats even out.

Oh, and a side note. To make the return order deterministic, I'd add a
secondary sort on id. It's not your problem at this point, but when all the
sort criteria match, the _internal_ Lucene doc ID is used to break ties, and
that can vary after segments are merged. For future reference.

Best,
Erick



On Mon, Mar 26, 2018 at 11:39 AM, bbarani <bbarani@gmail.com> wrote:
> Hi,
>
> I was trying to query a field that has specific term in it and to my
> surprise the score was different for different documents even though
> the field I am searching for contained the same exact terms in all the
> documents.
>
> Any idea when this issue would come up?
>
> *Note:* All the documents contained the value 'iphone brown case' in
> query_t field and I am on SOLR 6.1
>
> *Query:*
> select?q=iphone+brown+case&omitHeader=false&fl=score,query_t,timestamp
> _tdt&sort=score%20desc&wt=xml&qf=query_t&defType=edismax&mm=100%25&row
> s=5
>
> <response>
> <lst name="responseHeader">
> <bool name="zkConnected">true</bool>
> <int name="status">0</int>
> <int name="QTime">9</int>
> <lst name="params">
> <str name="mm">100%</str>
> <str name="q">iphone brown case</str>
> <str name="defType">edismax</str>
> <str name="omitHeader">false</str>
> <str name="qf">query_t</str>
> <str name="fl">score,query_t,timestamp_tdt</str>
> <str name="callback">getSuggestions</str>
> <str name="sort">score desc</str>
> <str name="rows">5</str>
> <str name="wt">xml</str>
> <str name="_">1521045725381</str>
> </lst>
> </lst>
> <result name="response" numFound="4" start="0" maxScore="6.306856">
> <doc> <arr name="query_t"> <str>iphone brown case</str> </arr> <date
> name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*6.306856*</float> </doc> <doc> <arr
> name="query_t"> <str>iphone brown case</str> </arr> <date
> name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*4.8550515*</float> </doc> <doc> <arr
> name="query_t"> <str>iphone brown case</str> </arr> <date
> name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*4.8550515*</float> </doc> <doc> <arr
> name="query_t"> <str>iphone brown case</str> </arr> <date
> name="timestamp_tdt">2018-03-26T13:40:14.690Z</date>
> <float name="score">*4.8550515*</float> </doc> </result> </response>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

No comments:

Post a Comment