Saturday, March 17, 2018

FW: Expose a metric for percentage-recovered during full recoveries

-----Original Message-----
From: Andrzej Białecki [mailto:andrzej.bialecki@lucidworks.com]
Sent: 15 March 2018 14:20
To: solr-user@lucene.apache.org
Subject: Re: Expose a metric for percentage-recovered during full recoveries

Hi S G,

This looks useful, and it should be easy to add to the existing metrics in
ReplicationHandler, probably somewhere around ReplicationHandler:856 .

> On 14 Mar 2018, at 20:16, S G <sg.online.email@gmail.com> wrote:
>
> Hi,
>
> Solr does full recoveries very frequently - sometimes even for
> seemingly simple cases like adding a field to the schema, a couple of
> nodes go into recovery.
> It would be nice if it did not do such full recoveries so frequently
> but since that may require a lot of fixing, can we have a metric that
> reports how much a core has recovered already?
>
> Example:
>
> $ cd data
> $ du -h . | grep my_collection | grep -w index
> 77G ./my_collection_shard3_replica2/data/index.20180314184942993
> 145G ./my_collection_shard3_replica2/data/index.20180112001943687
>
> This shows that the shard3-replica2 core is doing a full recovery and
> has only copied 77G out of 145G That is about 50% recovery done.
>
>
> It would be very nice if we can have this as a JMX metric and we can
> then plot it somewhere instead of having to keep running the same
> command in a loop and guessing how much is left to be copied.
>
> A metric like the following would be great:
> {
> "my_collection_shard3_replica2": {
> "recovery": {
> "currentSize": "77 gb",
> "expectedSize": "145 gb",
> "percentRecovered": "50",
> "startTimeEpoch": "361273126317"
> }
> }
> }
>
> If it looks useful, I will open a JIRA for the same.
>
> Thanks
> SG

No comments:

Post a Comment