Sunday, April 1, 2018

FW: Why are cursor mark queries recommended over regular start, rows combination?

-----Original Message-----
From: Webster Homer [mailto:webster.homer@sial.com]
Sent: 27 March 2018 00:57
To: solr-user@lucene.apache.org
Subject: Re: Why are cursor mark queries recommended over regular start,
rows combination?

Shawn,
Thanks. It's been a while now, but we did find issues with both cursorMark
AND start/rows. the effect was much more obvious with cursorMark.
We were able to address this by switching to use TLOG replicas. These give
consistent results. It's nice to know that the cursorMark problems were
related to relevancy retrieval order.

We found one major drawback with TLOG replicas, and that was that CDCR was
broken for TLOG replicas. There is a Jira on this, and it is being
addressed. NRT may have a use case, but I think that reproducible correct
results should trump performance everytime. We use Solr as a search engine,
we almost always want to retrieve results in order of relevancy.

I think that we will phase out the use of NRT replicas in favor of TLOG
replicas

On Fri, Mar 23, 2018 at 7:04 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 3/23/2018 3:47 PM, Webster Homer wrote:
> > Just FYI I had a project recently where I tried to use cursorMark in
> > Solrcloud and solr 7.2.0 and it was very unreliable. It couldn't
> > even return consistent numberFound values. I posted about it in this
forum.
> > Using the start and rows arguments in SolrQuery did work reliably so
> > I abandoned cursorMark as just too buggy
> >
> > I had originally wanted to try using streaming expressions, but they
> don't
> > return results ordered by relevancy, a major limitation for a search
> > engine, in my opinion.
>
> The problems that can affect cursorMark are also problems when using
> start/rows pagination.
>
> You've mentioned relevancy ordering, so I think this is what you're
> running into:
>
> Trying to use relevancy ranking on SolrCloud with NRT replicas can
> break pagination. The problem happens both with cursorMark and
start/rows.
> NRT replicas in a SolrCloud index can have different numbers of
> deleted documents. Even though deleted documents do not appear in
> search results, they ARE still part of the index, and can affect scoring.
> Since SolrCloud load balances requests across replicas, page 1 may use
> different replicas than page 2, and end up with different scoring,
> which can affect the order of results and change which page number
> they end up on. Using TLOG or PULL replicas (available since 7.0)
> usually fixes that problem, because different replicas are 100%
> identical with those replica types.
>
> Changing the index in the middle of trying to page through results can
> also cause issues with pagination.
>
> Thanks,
> Shawn
>
>

--


This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to any
other person. If you have received this transmission in error, please notify
the sender immediately and delete the message and any attachment from your
system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not
accept liability for any omissions or errors in this message which may arise
as a result of E-Mail-transmission or for damages resulting from any
unauthorized changes of the content of this message and any attachment
thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not
guarantee that this message is free of viruses and does not accept liability
for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.

No comments:

Post a Comment