Saturday, March 17, 2018

FW: Implications of using implicit routing

-----Original Message-----
From: Chris Ulicny [mailto:culicny@iq.media]
Sent: 14 March 2018 21:22
To: solr-user@lucene.apache.org
Subject: Re: Implications of using implicit routing

Shawn,

I knew that the shard had to be specified by the indexing process or
document, but I didn't realize that the uniqueness of the document across
the collection also had to be handled outside of solr as well.

We've used the compositeId router successfully to route documents, but it
seemed that the implicit/manual routing might work for this new collection.
Apparently not based on the requirement of the indexing processes to enforce
uniqueness as well as distribution.

Thanks for the help.
Chris

On Wed, Mar 14, 2018 at 11:39 AM Shawn Heisey <elyograg@elyograg.org> wrote:

> On 3/14/2018 9:26 AM, Chris Ulicny wrote:
> > We've been looking at using implicit for one of our collections, and
> there
> > seems to be some weird behavior that we're not sure whether it was
> expected
> > or not.
> >
> > Is it recommended to use a uniqueKey for implicit routing? Is the
> following
> > behavior intended?
> >
> > We have encountered the following issue. Create a collection with
> > two shards (S1,S2), implicit routing, with "id" as uniqueKey, and
> router.field
> > as "routingfield". If we index
> >
> > {"id":"id1","routingfield":"S1"}
> >
> > It goes into shard S1. Then if we need to reindex the document with
> > a different "routingfield" value:
> >
> > {"id":"id1","routingfield":"S2"}
> >
> > It goes into shard S2. However, when you select the document in a
> > query,
> it
> > seems that both of those documents exist, but get deduped on return
> > since selecting all documents only ever returns a single document.
> > Adding
> [shard]
> > to the fl list results in the document coming from S1 some of the
> > time
> and
> > S2 the rest.
> >
> > Trying to use /get with just the id results in a NullReferenceException.
> > Adding the _route_ parameter in works, but both documents can be
> retrieved.
>
> This is a common misconception with the implicit router. That name is
> a completely correct summary of what the router does, but it is one of
> those "overloaded" words in the English language that is often not
> completely understood.
>
> A better name for "implicit" would actually be "manual." By using this
> router, you have told Solr not to worry about routing -- that you're
> going to handle it, and that you're going to make sure every document
> is unique across all shards. Then you indexed the same document to
> two shards -- intentionally. Solr isn't going to prevent that --
> there's nothing it can do to prevent it without making all indexing a LOT
slower.
>
> If you want Solr to handle routing for you, then you must use the
> compositeId router. With that router, you do not get to specify which
> shard contains your document, and you cannot add shards after the
> collection is created. Later you can SPLIT shards, but you can't add
them.
>
> Thanks,
> Shawn
>
>

No comments:

Post a Comment