Sunday, April 1, 2018

FW: Default Index config

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: 29 March 2018 10:39
To: solr-user@lucene.apache.org
Subject: Re: Default Index config

On 3/28/2018 9:44 PM, mganeshs wrote:
> Regarding auto commit, we discussed lot with our product owners and
> atlast we are forced to keep it to 1sec and we couldn't increase
> further. As this itself, sometimes our customers says that they have
> to refresh their pages for couple of times to get the update from
> solr. So we can't increase further.

I understand pressure from nontechnical departments for very low response
times. Executives, sales, and marketing are usually the ones making those
kinds of demands. I think you should push back on that particular
requirement on technical grounds.

A soft commit interval that low *can* contribute to performance issues. It
doesn't always cause them, I'm just saying that it *can*.  Maybe increasing
it to five or ten seconds could help performance, or maybe it will make no
real difference at all.

> Yes. As of now only solr is running in that machine. But intially we
> were running along with hbase region servers and was working fine. But
> due to CPU spikes and OS disk cache, we are forced to move solr to
separate machine.
> But just I checked, our solr data folder size is coming only to 17GB.
> 2 collection has around 5GB and other are have 2 to 3 GB of size. If
> you say that only 2/3 of total size comes to OS disk cache, in top
> command VIRT property it's always 28G, which means more than what we have.
Why is that...
> Pls check that top command & GC we used in this doc
> <https://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQfDqf
> mV802hWIiQ/edit?usp=sharing>

The VIRT memory should be about equivalent to the RES size plus the size of
all the index data on the system.  So that looks about right.  The actual
amount of memory allocated by Java for the heap and other memory structures
is approximately equal to RES minus SHR.

I am not sure whether the SHR size gets counted in VIRT. It probably does. 
On some Linux systems, SHR grows to a very high number, but when that
happens, it typically doesn't reflect actual memory usage.  I do not know
why this sometimes happens.That is a question for Oracle, since they are the
current owners of Java.

Only 5GB is in the buff/cache area.  The system has 13GB of free memory. 
That system is NOT low on memory.

With 4 CPUs, a load average in the 3-4 range is an indication that the
server is busy.  I can't say for sure whether it means the server is
overloaded.  Sometimes the load average on a system that's working well can
go higher than the CPU count, sometimes a load average well below the CPU
count is shown on a system with major performance issues.  It's difficult to
say.  The instantaneous CPU usage on the Solr process in that screenshot is
384 percent.  Which means that it is exercising the CPUs hard. But this
might be perfectly OK.  96.3 percent of the CPU is being used by user
processes, a VERY small amount is being used by system, and the iowait
percentage is zero.  Typically servers that are struggling will have a
higher percentage in system and/or iowait, and I don't see that here.

> Queries are quiet fast, most of time simple queries with fq. Regarding
> index, during peak hours, we index around 100 documents in a second in
> a average.

That's good.  And not surprising, given how little memory pressure and how
much free memory there is.  An indexing rate of 100 per second doesn't seem
like a lot of indexing to me, but for some indexes, it might be very heavy. 
If your general performance is good, I wouldn't be too concerned about it.

> Regarding release, initially we tried with 6.4.1 and since many
> discussions over here, mentioned like moving to 6.5.x will solve lot
> of performance issues etc, so we moved to 6.5.1. We will move to 6.6.3 in
near future.

The 6.4.1 version had a really bad bug in it that killed performance for
most users.  Some might not have even noticed a problem, though.  It's
difficult to say for sure whether it would be something you would notice, or
whether you would see an increase in performance by upgrading.

> Hope I have given enough information. One strange thing is that, CPU
> and memory spike are not seen when we move to r4.xlarge to r4.2xlarge
> ( which is
> 8 core with 60 GB RAM ). But this would not be cost effective. What's
> making CPU and memory to go high in this new version ( due to doc
> values )? If I switch off docvalues will CPU & Memory spikes will get
reduced ?

Overall memory usage (outside of the Java heap) looks great to me.  CPU
usage is high, but I can't tell if it's TOO high. As a proof of concept, I
think you should try raising autoSoftCommit to five seconds.  If maxDocs is
configured on either autoCommit or autoSoftCommit, remove it so that only
maxTime is there, regardless of whether you actually change maxTime.  If
raising autoSoftCommit makes no real difference, then the 1 second
autoSoftCommit probably isn't a worry.  I bet if you raised it to five
seconds, most users would never notice anything different.

If you want to provide a GC log to us that covers a relatively long
timeframe, we can analyze that and let you know whether your heap is sized
appropriately, or whether it might be too big or too small, and whether
garbage collection pauses are keeping your CPU usage high.  The standard
Solr startup in most current versions always logs GC activity. It will
usually be in the same directory as solr.log.

Do you know what typical and peak queries per second are on your Solr
servers?  If your query rate is high, handling that will probably require
more servers and a higher replica count.

Thanks,
Shawn

No comments:

Post a Comment