Saturday, March 17, 2018

FW: Expose a metric for percentage-recovered during full recoveries

-----Original Message-----
From: S G [mailto:sg.online.email@gmail.com]
Sent: 15 March 2018 00:46
To: solr-user@lucene.apache.org
Subject: Expose a metric for percentage-recovered during full recoveries

Hi,

Solr does full recoveries very frequently - sometimes even for seemingly
simple cases like adding a field to the schema, a couple of nodes go into
recovery.
It would be nice if it did not do such full recoveries so frequently but
since that may require a lot of fixing, can we have a metric that reports
how much a core has recovered already?

Example:

$ cd data
$ du -h . | grep my_collection | grep -w index
77G ./my_collection_shard3_replica2/data/index.20180314184942993
145G ./my_collection_shard3_replica2/data/index.20180112001943687

This shows that the shard3-replica2 core is doing a full recovery and has
only copied 77G out of 145G That is about 50% recovery done.


It would be very nice if we can have this as a JMX metric and we can then
plot it somewhere instead of having to keep running the same command in a
loop and guessing how much is left to be copied.

A metric like the following would be great:
{
"my_collection_shard3_replica2": {
"recovery": {
"currentSize": "77 gb",
"expectedSize": "145 gb",
"percentRecovered": "50",
"startTimeEpoch": "361273126317"
}
}
}

If it looks useful, I will open a JIRA for the same.

Thanks
SG

No comments:

Post a Comment