Saturday, March 17, 2018

FW: SolrCloud update and luceneMatchVersion

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 14 March 2018 23:57
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: SolrCloud update and luceneMatchVersion

Hendrik:

There's one problem with IndexUpgraderTool. As Shawn points out, it does a
forceMerge, which by default creates one large segment. This has some
implications in terms of the number of deleted documents if the index has
updates afterwards, see:

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize
-may-bad/


and the associated JIRA: https://issues.apache.org/jira/browse/LUCENE-7976

My recommendation would be to _not_ run the IndexUpgraderTool and let
background merging do what's necessary over time. Or, as Shawn says,
re-index from scratch.

Exceptions:
1> your index is less than 5g. Since that's the default max segment
size (see the article), it won't matter.
2> you optimize frequently anyway
3> you _might_ getaway with a forceMerge where you specify the number
of segments to create is (index_size_in_gigabytes/5g). But frankly I don't
know enough about the algorithm for how segments are chosen in that case to
know whether that'd do exactly what you want.

Best,
Erick

On Wed, Mar 14, 2018 at 10:08 AM, Hendrik Haddorp <hendrik.haddorp@gmx.net>
wrote:
> Thanks for the detailed description!
>
>
> On 14.03.2018 16:11, Shawn Heisey wrote:
>>
>> On 3/14/2018 5:56 AM, Hendrik Haddorp wrote:
>>>
>>> So you are saying that we do not need to run the IndexUpgrader tool
>>> if we move from 6 to 7. Will the index be then updated automatically
>>> or will we get a problem once we move to 8?
>>
>>
>> If you don't run IndexUpgrader, and the index version is one that the
>> new Solr can read, then existing index segments will remain in the
>> format they are. New segments will be written in the new format. If
>> any of the existing segments are merged, then the new larger segment
>> will be in the new format.
>>
>> Summary: If an index starts out as 6.x, then is run for a while in
>> 7.x, but there are still 6.x segments left, then that index will not work
in 8.0.
>>
>> IndexUpgrader is a Lucene tool. This tool just runs a forceMerge
>> process on the index, which will merge all of the existing segments
>> into a single segment. It's EXACTLY the same operation that Solr calls
"optimize".
>> (Lucene used to call it optimize too. Then they renamed it.)
>>
>>> How would one use the IndexUpgrader at all with Solr? Would one need
>>> to run it against the index of every core?
>>
>>
>> The Solr server must be shut down during the IndexUpgrader run.
>> IndexUpgrader is a completely separate tool, part of Lucene. It has
>> zero knowledge of anything that you have configured in Solr, so you
>> must locate the index directory of any core you want to upgrade and
>> run the tool on that index directory.
>>
>> Thanks,
>> Shawn
>>
>

No comments:

Post a Comment