Sunday, April 1, 2018

FW: solrcloud Auto-commit doesn't seem reliable

-----Original Message-----
From: Webster Homer [mailto:webster.homer@sial.com]
Sent: 23 March 2018 21:34
To: solr-user@lucene.apache.org
Subject: Re: solrcloud Auto-commit doesn't seem reliable

It's been a while since I had time to look further into this. I'll have to
go back through logs, which I need to get retrieved by an admin.

On Fri, Mar 23, 2018 at 8:45 AM, Amrit Sarkar <sarkaramrit2@gmail.com>
wrote:

> Elaino,
>
> When you say commits not working, the solr logs not printing "commit"
> messages? or documents are not appearing when we search.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Mar 22, 2018 at 4:05 AM, Elaine Cario <etcario@gmail.com> wrote:
>
> > I'm just catching up on reading solr emails, so forgive me for being
> > late to this dance....
> >
> > I've just gone through a project to enable CDCR on our Solr, and I
> > also experienced a small period of time where the commits on the
> > source server just seemed to stop. This was during a period of
> > intense experimentation where I was mucking around with configurations,
turning CDCR on/off, etc.
> > At some point the commits stopped occurring, and it drove me nuts
> > for a couple of days - tried everything - restarting Solr,
> > reloading, turned buffering on, turned buffering off, etc. I
> > finally threw up my hands and rebooted the server out of desperation (it
was a physical Linux box).
> > Commits worked fine after that. I don't know what caused the
> > commits to stop, and why re-booting (and not just restarting Solr)
> > caused them to
> work
> > fine.
> >
> > Wondering if you ever found a solution to your situation?
> >
> >
> >
> > On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer
> > <webster.homer@sial.com>
> > wrote:
> >
> > > I meant to get back to this sooner.
> > >
> > > When I say I issued a commit I do issue it as
> > collection/update?commit=true
> > >
> > > The soft commit interval is set to 3000, but I don't have a
> > > problem
> with
> > > soft commits ( I think). I was responding
> > >
> > > I am concerned that some hard commits don't seem to happen, but I
> > > think many commits do occur. I'd like suggestions on how to
> > > diagnose this,
> and
> > > perhaps an idea of where to look. Typically I believe that issues
> > > like
> > this
> > > are from our configuration.
> > >
> > > Our indexing job is pretty simple, we send blocks of JSON to
> > > <collection>/update/json. We have either re-index the whole
> > > collection,
> > or
> > > just apply updates. Typically we reindex the data once a week and
> delete
> > > any records that are older than the last full index. This does
> > > lead to
> a
> > > fair number of deleted records in the index especially if commits
fail.
> > > Most of our collections are not large between 2 and 3 million records.
> > >
> > > The collections are hosted in google cloud
> > >
> > > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > bq: But if 3 seconds is aggressive what would be a good value
> > > > for
> soft
> > > > commit?
> > > >
> > > > The usual answer is "as long as you can stand". All top-level
> > > > caches
> > are
> > > > invalidated, autowarming is done etc. on each soft commit. That
> > > > can
> be
> > a
> > > > lot of
> > > > work and if your users are comfortable with docs not showing up
> > > > for, say, 10 minutes then use 10 minutes. As always "it depends"
> > > > here, the point is not to do unnecessary work if possible.
> > > >
> > > > bq: If a commit doesn't happen how would there ever be an index
> > > > merge that would remove the deleted documents.
> > > >
> > > > Right, it wouldn't. It's a little more subtle than that though.
> > > > Segments on various
> > > > replicas will contain different docs, thus the term/doc
> > > > statistics
> can
> > be
> > > > a bit
> > > > different between multiple replicas. None of the stats will
> > > > change until the commit though. You might try turning no
> > > > distributed doc/term stats though.
> > > >
> > > > Your comments about PULL or TLOG replicas are well taken.
> > > > However,
> even
> > > > those
> > > > won't be absolutely in sync since they'll replicate from the
> > > > master
> at
> > > > slightly
> > > > different times and _could_ get slightly different segments _if_
> > > > there's indexing going on. But let's say you stop indexing.
> > > > After the next poll interval all the replicas will have
> > > > identical characteristics and will score the docs the same.
> > > >
> > > > I don't have any signifiant wisdom to offer here, except this is
> really
> > > the
> > > > first time I've heard of this behavior. About all I can imagine
> > > > is that _somehow_ the soft commit interval is -1. When you say
> > > > you "issue a commit" I'm assuming it's via
> > > > ....collection/update?commit=true or some such which
> issues a
> > > > hard
> > > > commit with openSearcher=true. And it's on a _collection_ basis,
> right?
> > > >
> > > > Sorry I can't be more help
> > > > Erick
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Feb 12, 2018 at 10:44 AM, Webster Homer <
> > webster.homer@sial.com>
> > > > wrote:
> > > > > Erick, I am aware of the CDCR buffering problem causing tlog
> > retention,
> > > > we
> > > > > always turn buffering off in our cdcr configurations.
> > > > >
> > > > > My post was precipitated by seeing that we had uncommitted
> > > > > data in collections > 24 hours after it was loaded. The
> > > > > collections I was
> > > looking
> > > > > at are in our development environment, where we do not use CDCR.
> > > However
> > > > > I'm pretty sure that I've seen situations in production where
> commits
> > > > were
> > > > > also long overdue.
> > > > >
> > > > > the "autoSoftcommit" was a typo. The soft commit logic seems
> > > > > to be
> > > fine,
> > > > I
> > > > > don't see an issue with data visibility. But if 3 seconds is
> > aggressive
> > > > > what would be a good value for soft commit? We have a couple
> > > > > of collections that are updated every minute although most of
> > > > > them are
> > > > updated
> > > > > much less frequently.
> > > > >
> > > > > My reason for raising this commit issue is that we see
> > > > > problems
> with
> > > the
> > > > > relevancy of solrcloud searches, and the NRT replica type.
> Sometimes
> > > the
> > > > > results flip where the best hit varies by what replica
> > > > > serviced the
> > > > search.
> > > > > This is hard to explain to management. Doing an optimized does
> > address
> > > > the
> > > > > problem for a while. I try to avoid optimizing for the reasons
> > > > > you
> > and
> > > > Sean
> > > > > list. If a commit doesn't happen how would there ever be an
> > > > > index
> > merge
> > > > > that would remove the deleted documents.
> > > > >
> > > > > The problem with deletes and relevancy don't seem to occur
> > > > > when we
> > use
> > > > TLOG
> > > > > replicas, probably because they don't do their own indexing
> > > > > but get
> > > > copies
> > > > > from their leader. We are testing them now eventually we may
> abandon
> > > the
> > > > > use of NRT replicas for most of our collections.
> > > > >
> > > > > I am quite concerned about this commit issue. What kinds of
> > > > > things
> > > would
> > > > > influence whether a commit occurs? One commonality for our
> > > > > systems
> is
> > > > that
> > > > > they are hosted in a Google cloud. We have a number of
> > > > > collections
> > that
> > > > > share configurations, but others that do not. I think commits
> > > > > do
> > > happen,
> > > > > but I don't trust that autoCommit is reliable. What can we do
> > > > > to
> make
> > > it
> > > > > reliable?
> > > > >
> > > > > Most of our collections are reindexed weekly with partial
> > > > > updates
> > > applied
> > > > > daily, that at least is what happens in production, our
> > > > > development
> > > > clouds
> > > > > are not as regular.
> > > > >
> > > > > Our solr startup script sets the following values:
> > > > > -Dsolr.autoCommit.maxDocs=35000
> > > > > -Dsolr.autoCommit.maxTime=60000
> > > > > -Dsolr.autoSoftCommit.maxTime=3000
> > > > >
> > > > > I don't think we reference solr.autoCommit.maxDocs in our
> > > solrconfig.xml
> > > > > files.
> > > > >
> > > > > here are our settings for autoCommit and autoSoftCommit
> > > > >
> > > > > We had a lot of issues with missing commits when we didn't set
> > > > > solr.autoCommit.maxTime
> > > > > <autoCommit>
> > > > > <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> > > > > <openSearcher>false</openSearcher>
> > > > > </autoCommit>
> > > > >
> > > > > <autoSoftCommit>
> > > > > <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
> > > > > </autoSoftCommit>
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey
> > > > > <apache@elyograg.org>
> > > > wrote:
> > > > >
> > > > >> On 2/9/2018 9:29 AM, Webster Homer wrote:
> > > > >>
> > > > >>> A little more background. Our production Solrclouds are
> > > > >>> populated
> > via
> > > > >>> CDCR,
> > > > >>> CDCR does not replicate commits, Commits to the target
> > > > >>> clouds
> > happen
> > > > via
> > > > >>> autoCommit settings
> > > > >>>
> > > > >>> We see relvancy scores get inconsistent when there are too
> > > > >>> many
> > > deletes
> > > > >>> which seems to happen when hard commits don't happen.
> > > > >>>
> > > > >>> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer <
> > > webster.homer@sial.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> I we do have autoSoftcommit set to 3 seconds. It is NOT the
> > > visibility
> > > > of
> > > > >>>> the records that is my primary concern. I am concerned
> > > > >>>> about is
> > the
> > > > >>>> accumulation of uncommitted tlog files and the larger
> > > > >>>> number of
> > > > deleted
> > > > >>>> documents.
> > > > >>>>
> > > > >>>
> > > > >> For the deleted documents: Have you ever done an optimize on
> > > > >> the collection? If so, you're going to need to re-do the
> > > > >> optimize
> > > > regularly to
> > > > >> keep deleted documents from growing out of control. See this
> issue
> > > for
> > > > a
> > > > >> very technical discussion about it:
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/LUCENE-7976
> > > > >>
> > > > >> Deleted documents probably aren't really related to what
> > > > >> we've
> been
> > > > >> discussing. That shouldn't really be strongly affected by
> > > > >> commit
> > > > settings.
> > > > >>
> > > > >> -----
> > > > >>
> > > > >> A 3 second autoSoftCommit is VERY aggressive. If your soft
> commits
> > > are
> > > > >> taking longer than 3 seconds to complete, which is often what
> > happens,
> > > > then
> > > > >> that will lead to problems. I wouldn't expect it to cause
> > > > >> the
> kinds
> > > of
> > > > >> problems you describe, though. It would manifest as Solr
> > > > >> working
> > too
> > > > hard,
> > > > >> logging warnings or errors, and changes taking too long to
> > > > >> show
> up.
> > > > >>
> > > > >> Assuming that the config for autoSoftCommit doesn't have the
> > > > >> typo
> > that
> > > > >> Erick mentioned.
> > > > >>
> > > > >> ----
> > > > >>
> > > > >> I have never used CDCR, so I know very little about it. But
> > > > >> I
> have
> > > seen
> > > > >> reports on this mailing list saying that transaction logs
> > > > >> never
> get
> > > > deleted
> > > > >> when CDCR is configured.
> > > > >>
> > > > >> Below is a link to a mailing list discussion related to CDCR
> > > > >> not
> > > > deleting
> > > > >> transaction logs. Looks like for it to work right a buffer
> > > > >> needs
> to
> > > be
> > > > >> disabled, and there may also be problems caused by not having
> > > > >> a
> > > complete
> > > > >> zkHost string in the CDCR config:
> > > > >>
> > > > >> http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-
> > > > >> the-transaction-log-files-td4345062.html
> > > > >>
> > > > >> Erick also mentioned this.
> > > > >>
> > > > >> Thanks,
> > > > >> Shawn
> > > > >>
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > This message and any attachment are confidential and may be
> > privileged
> > > or
> > > > > otherwise protected from disclosure. If you are not the
> > > > > intended
> > > > recipient,
> > > > > you must not copy this message or attachment or disclose the
> contents
> > > to
> > > > > any other person. If you have received this transmission in
> > > > > error,
> > > please
> > > > > notify the sender immediately and delete the message and any
> > attachment
> > > > > from your system. Merck KGaA, Darmstadt, Germany and any of
> > > > > its subsidiaries do not accept liability for any omissions or
> > > > > errors in
> > > this
> > > > > message which may arise as a result of E-Mail-transmission or
> > > > > for
> > > damages
> > > > > resulting from any unauthorized changes of the content of this
> > message
> > > > and
> > > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any
> > > > > of
> its
> > > > > subsidiaries do not guarantee that this message is free of
> > > > > viruses
> > and
> > > > does
> > > > > not accept liability for any damages caused by any virus
> transmitted
> > > > > therewith.
> > > > >
> > > > > Click http://www.emdgroup.com/disclaimer to access the German,
> > French,
> > > > > Spanish and Portuguese versions of this disclaimer.
> > > >
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be
> > > privileged
> or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the
> > > contents
> to
> > > any other person. If you have received this transmission in error,
> please
> > > notify the sender immediately and delete the message and any
> > > attachment from your system. Merck KGaA, Darmstadt, Germany and
> > > any of its subsidiaries do not accept liability for any omissions
> > > or errors in
> this
> > > message which may arise as a result of E-Mail-transmission or for
> damages
> > > resulting from any unauthorized changes of the content of this
> > > message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of
> > > its subsidiaries do not guarantee that this message is free of
> > > viruses and
> > does
> > > not accept liability for any damages caused by any virus
> > > transmitted therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German,
> > > French, Spanish and Portuguese versions of this disclaimer.
> > >
> >
>

--


This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to any
other person. If you have received this transmission in error, please notify
the sender immediately and delete the message and any attachment from your
system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not
accept liability for any omissions or errors in this message which may arise
as a result of E-Mail-transmission or for damages resulting from any
unauthorized changes of the content of this message and any attachment
thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not
guarantee that this message is free of viruses and does not accept liability
for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.

No comments:

Post a Comment