Sunday, April 1, 2018

FW: CDCR Invalid Number on deletes

-----Original Message-----
From: Amrit Sarkar [mailto:sarkaramrit2@gmail.com]
Sent: 21 March 2018 01:20
To: solr-user@lucene.apache.org
Subject: Re: CDCR Invalid Number on deletes

Hi Chris,

Sorry I was off work for few days and didn't follow the conversation. The
link is directing me to
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063. I think we
have fixed the issue stated by you in the jira, though the symptoms were
different than yours.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 21, 2018 at 1:17 AM, Chris Troullis <cptroullis@gmail.com>
wrote:

> Nevermind I found it....the link you posted links me to SOLR-12036
> instead of SOLR-12063 for some reason.
>
> On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis <cptroullis@gmail.com>
> wrote:
>
> > Hey Amrit,
> >
> > Did you happen to see my last reply? Is SOLR-12036 the correct JIRA?
> >
> > Thanks,
> >
> > Chris
> >
> > On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis
> > <cptroullis@gmail.com>
> > wrote:
> >
> >> Hey Amrit, thanks for the reply!
> >>
> >> I checked out SOLR-12036, but it doesn't look like it has to do
> >> with CDCR, and the patch that is attached doesn't look CDCR
> >> related. Are you sure that's the correct JIRA number?
> >>
> >> Thanks,
> >>
> >> Chris
> >>
> >> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar
> >> <sarkaramrit2@gmail.com>
> >> wrote:
> >>
> >>> Hey Chris,
> >>>
> >>> I figured a separate issue while working on CDCR which may relate
> >>> to
> your
> >>> problem. Please see jira: *SOLR-12063*
> >>> <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>.
> >>> This is a bug got introduced when we supported the bidirectional
> >>> approach where
> an
> >>> extra flag in tlog entry for cdcr is added.
> >>>
> >>> This part of the code is messing up:
> >>> *UpdateLog.java.RecentUpdates::update()::*
> >>>
> >>> switch (oper) {
> >>> case UpdateLog.ADD:
> >>> case UpdateLog.UPDATE_INPLACE:
> >>> case UpdateLog.DELETE:
> >>> case UpdateLog.DELETE_BY_QUERY:
> >>> Update update = new Update();
> >>> update.log = oldLog;
> >>> update.pointer = reader.position();
> >>> update.version = version;
> >>>
> >>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
> >>> update.previousVersion = (Long)
> >>> entry.get(UpdateLog.PREV_VERSI ON_IDX);
> >>> }
> >>> updatesForLog.add(update);
> >>> updates.put(version, update);
> >>>
> >>> if (oper == UpdateLog.DELETE_BY_QUERY) {
> >>> deleteByQueryList.add(update);
> >>> } else if (oper == UpdateLog.DELETE) {
> >>> deleteList.add(new DeleteUpdate(version,
> >>> (byte[])entry.get(entry.size()-1)));
> >>> }
> >>>
> >>> break;
> >>>
> >>> case UpdateLog.COMMIT:
> >>> break;
> >>> default:
> >>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> >>> "Unknown Operation! " + oper);
> >>> }
> >>>
> >>> deleteList.add(new DeleteUpdate(version,
> >>> (byte[])entry.get(entry.size() -1)));
> >>>
> >>> is expecting the last entry to be the payload, but everywhere in
> >>> the project, *pos:[2] *is the index for the payload, while the
> >>> last entry
> in
> >>> source code is *boolean* in / after Solr 7.2, denoting update is
> >>> cdcr forwarded or typical. UpdateLog.java.RecentUpdates is used to
> >>> in cdcr sync, checkpoint operations and hence it is a legit bug,
> >>> slipped the tests I wrote.
> >>>
> >>> The immediate fix patch is uploaded and I am awaiting feedback on
that.
> >>> Meanwhile if it is possible for you to apply the patch, build the
> >>> jar
> and
> >>> try it out, please do and let us know.
> >>>
> >>> For, *SOLR-9394*
> >>> <https://issues.apache.org/jira/browse/SOLR-9394>, if you can
> >>> comment on the JIRA and post the sample docs, solr logs, relevant
> >>> information, I can give it a thorough look.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>> Medium: https://medium.com/@sarkaramrit2
> >>>
> >>> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis
> >>> <cptroullis@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > We recently upgraded to Solr 7.2.0 as we saw that there were
> >>> > some
> CDCR
> >>> bug
> >>> > fixes and features added that would finally let us be able to
> >>> > make
> use
> >>> of
> >>> > it (bi-directional syncing was the big one). The first time we
> >>> > tried
> to
> >>> > implement we ran into all kinds of errors, but this time we were
> >>> > able
> >>> to
> >>> > get it mostly working.
> >>> >
> >>> > The issue we seem to be having now is that any time a document
> >>> > is
> >>> deleted
> >>> > via deleteById from a collection on the primary node, we are
> >>> > flooded
> >>> with
> >>> > "Invalid Number" errors followed by a random sequence of
> >>> > characters
> >>> when
> >>> > CDCR tries to sync the update to the backup site. This happens
> >>> > on all
> >>> of
> >>> > our collections where our id fields are defined as longs (some
> >>> > of
> them
> >>> the
> >>> > ids are compound keys and are strings).
> >>> >
> >>> > Here's a sample exception:
> >>> >
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException:
> >>> Error
> >>> > from server at http://ip/solr/collection_shard1_replica_n1:
> >>> > Invalid
> >>> > Number: ]
> >>> > -s
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > directUpdate(CloudSolrClient.java:549)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > sendRequest(CloudSolrClient.java:1012)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:883)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> >>> > CloudSolrClient.java:816)
> >>> > at
> >>> > org.apache.solr.client.solrj.SolrRequest.process(
> SolrRequest.java:194)
> >>> > at
> >>> > org.apache.solr.client.solrj.SolrRequest.process(
> SolrRequest.java:211)
> >>> > at
> >>> > org.apache.solr.handler.CdcrReplicator.sendRequest(
> >>> > CdcrReplicator.java:140)
> >>> > at
> >>> > org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104)
> >>> > at
> >>> > org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(
> >>> > CdcrReplicatorScheduler.java:81)
> >>> > at
> >>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> >>> > lambda$execute$0(ExecutorUtil.java:188)
> >>> > at
> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> >>> > ThreadPoolExecutor.java:1149)
> >>> > at
> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >>> > ThreadPoolExecutor.java:624)
> >>> > at java.lang.Thread.run(Thread.java:748)
> >>> >
> >>> >
> >>> > I'm scratching my head as to the cause of this. It's like it is
> trying
> >>> to
> >>> > deleteById for the value "]", even though that is not the ID for
> >>> > the document that was deleted from the primary. So I don't know
> >>> > if it is pulling this from the wrong field somehow or where that
> >>> > value if
> coming
> >>> > from.
> >>> >
> >>> > I found this issue:
> >>> > https://issues.apache.org/jira/browse/SOLR-9394
> >>> which
> >>> > looks related, but doesn't look like it has any traction.
> >>> >
> >>> > Has anyone else experienced this issue with CDCR, or have any
> >>> > ideas
> as
> >>> to
> >>> > what could be causing this issue?
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Chris
> >>> >
> >>>
> >>
> >>
> >
>

No comments:

Post a Comment