Tuesday, March 6, 2018

FW: Copying a SolrCloud collection to other hosts

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 06 March 2018 20:48
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Copying a SolrCloud collection to other hosts

this is part of the "different replica types" capability, there are NRT (the
only type available prior to 7x), PULL and TLOG which would have different
names. I don't know of any way to switch it off.

As far as moving the data, here's a little known trick: Use the replication
API to issue a fetchindexk, see:
https://lucene.apache.org/solr/guide/6_6/index-replication.html As long as
the target cluster can "see" the source cluster via http, this should work.
This is entirely outside SolrCloud and ZooKeeper is not involved. This would
even work with, say, one side being stand-alone and the other being
SolrCloud (not that you want to do that, just illustrating it's not part of
SolrCloud)...

So you'd specify something like:
http://target_node:port/solr/core_name/replication?command=fetchindex&master
Url=http://source_node:port/solr/core_name

"core_name" in these cases is what appears in the "cores" dropdown on the
admin UI page. You do not have to shut Solr down at all on either end to use
this, although last I knew the target node would not serve queries while
this was happening.

An alternative is to not hard-code the names in your copy script, rather
look at the information in ZooKeeper for your source and target information,
you could do this by using the CLUSTERSTATUS collections API call.

Best,
Erick

On Tue, Mar 6, 2018 at 6:47 AM, Patrick Schemitz <ps@solute.de> wrote:
> Hi List,
>
> so I'm running a bunch of SolrCloud clusters (each cluster is: 8
> shards on 2 servers, with 4 instances per server, no replicas, i.e. 1
> shard per instance).
>
> Building the index afresh takes 15+ hours, so when I have to deploy a
> new index, I build it once, on one cluster, and then copy (scp) over
> the data/<main_index>/index directories (shutting down the Solr instances
first).
>
> I could get Solr 6.5.1 to number the shard/replica directories nicely
> via the createNodeSet and createNodeSet.shuffle options:
>
> Solr 6.5.1 /var/lib/solr:
>
> Server node 1:
> instance00/data/main_index_shard1_replica1
> instance01/data/main_index_shard2_replica1
> instance02/data/main_index_shard3_replica1
> instance03/data/main_index_shard4_replica1
>
> Server node 2:
> instance00/data/main_index_shard5_replica1
> instance01/data/main_index_shard6_replica1
> instance02/data/main_index_shard7_replica1
> instance03/data/main_index_shard8_replica1
>
> However, while attempting to upgrade to 7.2.1, this numbering has changed:
>
> Solr 7.2.1 /var/lib/solr:
>
> Server node 1:
> instance00/data/main_index_shard1_replica_n1
> instance01/data/main_index_shard2_replica_n2
> instance02/data/main_index_shard3_replica_n4
> instance03/data/main_index_shard4_replica_n6
>
> Server node 2:
> instance00/data/main_index_shard5_replica_n8
> instance01/data/main_index_shard6_replica_n10
> instance02/data/main_index_shard7_replica_n12
> instance03/data/main_index_shard8_replica_n14
>
> This new numbering breaks my copy script, and furthermode, I'm worried
> as to what happens when the numbering is different among target clusters.
>
> How can I switch this back to the old numbering scheme?
>
> Side note: is there a recommended way of doing this? Is the
> backup/restore mechanism suitable for this? The ref guide is kind of
> terse here.
>
> Thanks in advance,
>
> Ciao, Patrick

No comments:

Post a Comment