Siddhast lab: March 2018

Monday, March 19, 2018

FW: Securying ONLY the web interface console

-----Original Message-----
From: Jesus Olivan [mailto:jesus.olivan@letgo.com]
Sent: 19 March 2018 22:49
To: solr-user@lucene.apache.org
Subject: Securying ONLY the web interface console

hi!

i'm trying to password protect only Solr web interface (not queries launched
from my app). I'm currently using SolrCloud 6.6.0 with external zookeepers.
I've read tons of Docs about it, but i couldn't find a proper way to secure
ONLY the web admin console. Can anybody give me some light about it, please?
=)

Thanks in advance!

FW: Question liste solr

-----Original Message-----
From: LOPEZ-CORTES Mariano-ext
[mailto:mariano.lopez-cortes-ext@pole-emploi.fr]
Sent: 19 March 2018 21:22
To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
Subject: RE: Question liste solr

Sorry. Thanks in advance !!

De : LOPEZ-CORTES Mariano-ext
Envoyé : lundi 19 mars 2018 16:50
À : 'solr-user@lucene.apache.org'
Objet : RE: Question liste solr

Hello

We have an index Solr with 3 nodes, 1 shard et 2 replicas.

Our goal is to index 42 millions rows. Indexing time is important. The data
source is an oracle database.

Our indexing strategy is :

· Reading from Oracle to a big CSV file.

· Reading from 4 files (big file chunked) and injection via
ConcurrentUpdateSolrClient

Is it the optimal way of injecting such mass of data into Solr ?

For information, estimated time for our solution is 6h.

FW: How does group.query work in solr?

-----Original Message-----
From: Anjani Kumar [mailto:anj.2403@gmail.com]
Sent: 19 March 2018 18:38
To: solr-user@lucene.apache.org
Subject: How does group.query work in solr?

I am trying to achieve grouping according to the fields matched in a single
query. So in the group.query parameter I am passing individual queries
according to field names and in the query parameter, I am passing just the
search term to be matched against the default copy fields. However It is
taking too much time compared to querying individual fields separately and
combining the data at my end.

here is the sample query:

http://localhost:8983/solr/psqlTest/select?q=201*&wt=json&indent=true&group=
true&group.query=road_name:201*&group.query=own_name:201*&group.query=tel_no
:201*&group.query=assm_no:201*&group.limit=5

This query takes time in order of seconds to complete when atleast 100 users
are hitting(using jmeter).

The query over each fields and joining the results take time in order of
100-200 ms for the same jmeter parameters. Why is there such a huge
difference in the performance?

FW: [PHP Classes] Notable PHP package: PHP Screenshot URL Handler

From: PHP Classes Notable [mailto:list-notable@phpclasses.org]
Sent: 19 March 2018 12:58
To: ROSHAN <roshan@siddhast.com>
Subject: [PHP Classes] Notable PHP package: PHP Screenshot URL Handler

Capturing screenshots of Web pages is useful for site developers. It helps understanding how Web site users see their Web site pages. Eventually, that may be useful to discover bugs or issues with the way the pages are presented to the user and that may not be alright. There are many solutions to capture screenshots of Web pages. Using the Google Page Speed API is often better because internally Google uses the Chrome browser to capture exactly the way pages are presented to users, as if there was a real user seeing a given Web page. This package uses Google Page Speed API to capture a Web page screenshot, thus taking advantage of its possibilities as described above.

Notable PHP package: PHP Screenshot URL Handler

Replay Real User Sessions

Monitor and Replay what Real Users do on your Website or Web app

ROSHAN, a PHP package is considered Notable when it does something different that is worth noting.

If you have also written Notable packages, contribute them to the PHP Classes site to get your work more exposure.

If your notable package is innovative, you may also earn prizes and recognition in the PHP Innovation Award.

Now you can also win a Big elePHPant as one of the possible prizes you can win every month. Check the complete list of prizes here: List of prizes

Package

PHP Screenshot URL Handler

Manipulate URLs and capture a screenshot of a page

Moderator comment

Capturing screenshots of Web pages is useful for site developers. It helps understanding how Web site users see their Web site pages. Eventually, that may be useful to discover bugs or issues with the way the pages are presented to the user and that may not be alright.

There are many solutions to capture screenshots of Web pages. Using the Google Page Speed API is often better because internally Google uses the Chrome browser to capture exactly the way pages are presented to users, as if there was a real user seeing a given Web page.

This package uses Google Page Speed API to capture a Web page screenshot, thus taking advantage of its possibilities as described above.

Author

Malik umer Farooq

Groups

PHP 5, Web services

Description

This package can manipulate URLs and capture a screenshot of a page.

It can perform several operations with URLs like:

- Cenerate the slug part from a text string
- Retrieve the contents of the respective page
- Extract the page title, meta description, keywords, images
- Capture a screenshot using the Google Page Speed API

ROSHAN you are getting this message as free service for being a user of the PHP Classes site to which you registered voluntarily using the email address roshan@siddhast.com. If you wish to unsubscribe go to the unsubscribe page.

FW: Error when indexing with SolrJ HTTP ERROR 405

-----Original Message-----
From: Khalid Moustapha Askia [mailto:m.askiakhalid@gmail.com]
Sent: 19 March 2018 09:16
To: solr-user@lucene.apache.org
Subject: Error when indexing with SolrJ HTTP ERROR 405

Hi. I am trying to index some data with Solr by using SolrJ. But I have this
error that I can't solve.

----------------------------------------------------------------------------
---------------------------------
Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/#/corename: Expected mime type
application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 405 HTTP POST method is not supported by this URL</title>
</head>
<body><h2>HTTP ERROR 405</h2>
<p>Problem accessing /solr/index.html. Reason:
<pre> Error 405 HTTP POST method is not supported by this
URL</pre></p>
</body>
</html>

at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClien
t.java:558)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java
:259)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java
:248)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85)
at indexsolr.index(indexsolr.java:33)
at LoadData.toIndex(LoadData.java:102)
at LoadData.loadDocuments(LoadData.java:72)
at IndexLaunch.main(IndexLaunch.java:12)

----------------------------------------------------------------------------
------------------------------

This is how I connect (I am in local):

--------------------------------------------------------------------

SolrClient client = new HttpSolrClient.Builder("
http://localhost:8983/solr/#/corename").build();

When I remove the "#" It throws a NullPointerException

I have been struggling for a week with this indexing...

Sunday, March 18, 2018

FW: Indexing multi level Nested JSON

-----Original Message-----
From: Zheng Lin Edwin Yeo [mailto:edwinyeozl@gmail.com]
Sent: 19 March 2018 08:46
To: solr-user@lucene.apache.org
Subject: Indexing multi level Nested JSON

Hi,

I have this sample multi level Nested JSON, with 2 level of child Documents.

[
{
"id": "1",
"title_s": "Solr adds block join support",
"contenttype_s": "parentDocument",
"_childDocuments_": [
{
"id": "3",
"comments_s": "SolrCloud supports it too!",
"_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"name_s":"edwin","pho
ne_s":"456"}]
},
{
"id": "3a",
"comments_s": "SolrCloud supports it too 2!",
"_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"name_s":"edwin","pho
ne_s":"456"}]
}
]
},
{
"id": "2",
"title_s": "New Lucene and Solr release is out",
"contenttype_s": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments_s": "Lots of new features",
"_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"name_s":"edwin","pho
ne_s":"456"}]
}
]
},
{
"id": "5",
"title_s": "Testing of Nested JSON",
"contenttype_s": "parentDocument",
"_childDocuments_": [
{
"id": "6",
"comments_s": "See if this is a child",
"_childDocuments_":[{"name_s":"alan","phone_s":"123"},{"name_s":"edwin","pho
ne_s":"456"}]
}
]
}
]

However, when it is indexed into Solr, there is only one level, and the
output becomes like this.

{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":1,
"params":{
"q":"contenttype_s:parentDocument",
"fl":"*,[child parentFilter=contenttype_s:parentDocument]",
"sort":"id asc"}},
"response":{"numFound":3,"start":0,"docs":[
{
"id":"1",
"title_s":"Solr adds block join support",
"contenttype_s":"parentDocument",
"signature":"0000000000000000",
"_version_":1595334082096529408,
"_childDocuments_":[
{
"name_s":"alan",
"phone_s":"123",
"_version_":1595334082096529408},
{
"name_s":"edwin",
"phone_s":"456",
"_version_":1595334082096529408},
{
"id":"3",
"comments_s":"SolrCloud supports it too!",
"_version_":1595334082096529408},
{
"name_s":"alan",
"phone_s":"123",
"_version_":1595334082096529408},
{
"name_s":"edwin",
"phone_s":"456",
"_version_":1595334082096529408},
{
"id":"3a",
"comments_s":"SolrCloud supports it too 2!",
"_version_":1595334082096529408}]},
{
"id":"2",
"title_s":"New Lucene and Solr release is out",
"contenttype_s":"parentDocument",
"signature":"0000000000000000",
"_version_":1595334082099675136,
"_childDocuments_":[
{
"name_s":"alan",
"phone_s":"123",
"_version_":1595334082099675136},
{
"name_s":"edwin",
"phone_s":"456",
"_version_":1595334082099675136},
{
"id":"4",
"comments_s":"Lots of new features",
"_version_":1595334082099675136}]},
{
"id":"5",
"title_s":"Testing of Nested JSON",
"contenttype_s":"parentDocument",
"signature":"0000000000000000",
"_version_":1595334082101772288,
"_childDocuments_":[
{
"name_s":"alan",
"phone_s":"123",
"_version_":1595334082101772288},
{
"name_s":"edwin",
"phone_s":"456",
"_version_":1595334082101772288},
{
"id":"6",
"comments_s":"See if this is a child",
"_version_":1595334082101772288}]}]
}}

Is Solr able to support the indexing of multi level Nested JSON?

I have tested this on Solr 6.5.1.

Regards,
Edwin

FW: [aadhaarauth] Auth Url not working

From: contact.manishkangia via Aadhaar Authentication Discussions Group [mailto:aadhaarauth+APn2wQfAd3P8GX6nKi2pUMIuSXMdc3Q2iJPInXs4W3Q6GIcvDGlL@googlegroups.com]
Sent: 19 March 2018 00:09
To: Aadhaar Authentication Discussions Group <aadhaarauth@googlegroups.com>
Subject: [aadhaarauth] Auth Url not working

Hi,

Following the details here https://authportal.uidai.gov.in/web/uidai/developer to test aadhar authentication.

i am trying to ping for authentication at http://auth.uidai.gov.in/1.6//<1st-digit-of-uid>/<2nd-digit-of-uid>/ but it's not able to connect.

I tried the url that use to worker earlier
http://auth.uidai.gov.in/1.6/public/9/9/MEaMX8fkRa6PqsqK6wGMrEXcXFl_oXHA-YuknI2uf0gKgZ80HaZgG3A

and also tried without "public"
http://auth.uidai.gov.in/1.6/9/9/MEaMX8fkRa6PqsqK6wGMrEXcXFl_oXHA-YuknI2uf0gKgZ80HaZgG3A

Is the service unavailable or has been moved to a different url?

Please help with this.

Regards

--
--
********* USAGE GUIDELINES AND DISCLAIMER *********
https://groups.google.com/forum/#!forum/aadhaarauth

Please consider the environment before printing or forwarding.

All participants on this discussion group are advised not to share confidential content. This email may contain virus or malware. You are advised to take necessary protection. UIDAI does not guarantee the reliability, completeness, accuracy, timeliness or up-to-date-ness of the material presented here or take any liability on any damage caused by using the content. In your use of the forum, you agree that you will not post any information which is unlawful, harmful, threatening, abusive, harassing, defamatory, vulgar, obscene, harassing, threatening, invading of others privacy, hateful, racially, ethnically or otherwise objectionable or violates any laws.

By using this group, you automatically agree to terms at http://developer.uidai.gov.in/site/group_terms
********* USAGE GUIDELINES AND DISCLAIMER *********
---
You received this message because you are subscribed to the Google Groups "Aadhaar Authentication Discussions Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aadhaarauth+unsubscribe@googlegroups.com.
To post to this group, send email to aadhaarauth@googlegroups.com.
Visit this group at https://groups.google.com/group/aadhaarauth.
For more options, visit https://groups.google.com/d/optout.

FW: collection reload leads to OutOfMemoryError

-----Original Message-----
From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
Sent: 18 March 2018 21:53
To: solr-user@lucene.apache.org
Subject: collection reload leads to OutOfMemoryError

Hi,

I did a simple test on a three node cluster using Solr 7.2.1. The JVMs
(Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_162
25.162-b12) have about 6.5GB heap and 1.5GB metaspace. In my test I have
1000 collections with only 1000 simple documents each. I'm then triggering
collections reloads via SolrJ using a fixed number of threads, as this has
shown memory issues in the past. Even with two threads the nodes eventually
die with an OOM Error as they are running out of metaspace. I found the
following Jiras that might be about the same issue:
    https://issues.apache.org/jira/browse/SOLR-10506
    https://issues.apache.org/jira/browse/SOLR-9117
    https://issues.apache.org/jira/browse/SOLR-6678

The first two are flagged as fixed in 7.0.

Any ideas, beside not doing reloads?

regards,
Hendrik

FW: Looking for design ideas

-----Original Message-----
From: Steven White [mailto:swhite4141@gmail.com]
Sent: 18 March 2018 20:44
To: solr-user@lucene.apache.org
Subject: Looking for design ideas

Hi everyone,

I have a design problem that i"m not sure how to solve best so I figured I
share it here and see what ideas others may have.

I have a DB that hold documents (over 1 million and growing). This is known
as the "Public" DB that holds documents visible to all of my end users.

My application let users "check-out" one or more documents at a time off
this "Public" DB, edit them and "check-in" back into the "Public" DB. When
a document is checked-out, it goes into a "Personal" DB for that user (and
the document in the "Public" DB is flagged as such to alert other users.)
The owner of this checked-out document in the "Personal" DB can make changes
to the document and save it back into the "Personal" DB as often as he wants
to. Sometimes the document lives in the "Personal" DB for few minutes
before it is checked-in back into the "Public" DB and sometimes it can live
in the "Personal" DB for 1 day or 1 month. When a document is saved into
the "Personal" DB, only the owner of that document can see it.

Currently there are 100 users but this will grow to at least 500 or maybe
even 1000.

I'm looking at a solution on how to enable a full text search on those
documents, both in the "Public" and "Personal" DB so that:

1) Documents in the "Public" DB are searchable by all users. This is the
easy part.

2) Documents in the "Personal" DB of each user is searchable by the owner of
that "Personal" DB. This is easy too.

3) A user can search both the "Public" and "Personal" DB at anytime but if a
document is in the "Personal" DB, we will not search it the "Public" --
i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB.

Item #3 is important and is what I'm trying to solve. The goal is to give
hits to the user on documents that they are editing (in their "Personal"
DB) instead of that in the "Public".

The way I'm thinking to solve this problem is to create 2 Solr indexes (do
we call those "cores"?):

1) The "Public" DB is indexed into the "Public" Solr index.

2) The "Personal" DB is indexed into the "Personal" Solr index with a field
indicating the owner of that document.

With the above 2 indexes, I can now send the user's search syntax to both
indexes but for the "Public", I will also send a list of IDs (those
documents in the user's "Personal" DB) to exclude from the result set.
This way, I let a user search both the "Public" and "Personal" DB as such
the documents in the "Personal" DB are included in the search and are
excluded from the "Public" DB.

Did I make sense? If so, is this doable? Will ranking be effected given
that I'm searching 2 indexes?

Let me know what issues I might be overlooking with this solution.

Thanks

Steve

Saturday, March 17, 2018

FW: Replication in Master Slave Solr setup

-----Original Message-----
From: vracks [mailto:v.rajeshgce@gmail.com]
Sent: 18 March 2018 07:37
To: solr-user@lucene.apache.org
Subject: Replication in Master Slave Solr setup

Basic Questions about the Replication in Master Slave Solr Setup.

1) Can Master push the changes to Slaves using the replication handler

2) If the Answer to the above question is no, then what is use of having the
option of replicateAfter in the replicationHandler, since only the Slave is
going to poll the master at a particular interval.
If the answer to the above question is yes, then wanted to know the master
knows about the Slave instances to which to push the changes. Since the
replication handler dont have any options to mention the slaves.If the
answer to the above question is yes, which is best option for replication
push or pull.

Sample Master Slave replication handler setup

<requestHandler name="/replication" class="solr.ReplicationHandler">
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str>
</lst>
<lst name="slave">
<str
name="masterUrl">http://master.solr.company.com:8983/solr/core_name/replicat
ion</str>
<str name="pollInterval">00:00:60</str>
</lst>
</requestHandler>

Please help me on understanding the replication architecture on Solr.

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

-----Original Message-----
From: spoonerk [mailto:john.spooner@gmail.com]
Sent: 12 March 2018 18:26
To: solr-user@lucene.apache.org
Subject: Re: The Impact of the Number of Collections on Indexing Performance
in Solr 6.0

I have tried emailing to.unsubscribe. I have tried disrupting threads
hoping to anger the admin into getting me out of the spam list. All I get
is arrogant emails about headers

On Mar 12, 2018 1:15 AM, "苗海泉" <mseaspring@gmail.com> wrote:

> Thanks Erick and Shawn , Thank you for your patience. I said that
> the above phenomenon was caused by the IO, cpu, memory, and network
> io. The swap was turned off and the machine's memory was sufficient.
> When the speed of indexing is declining, QTime is found to take 3
> seconds to 4 seconds to reload the index, so it can be guessed that it
> is more likely to be a Solr problem than a jetty. It is worth
> mentioning that when the speed of the index under construction dropped
> sharply, the Solr used only about 5% of the CPU, and when it was
> normal, the CPU usage was 200 percent, and the overall system's CPU usage
was 100 percent. About twenty.
>
> Basic information:
> 1) The data volume of each collection is between 2 billion and 3 billion.
> 2) The configuration of the machine is 24 cpu and 128G memory.
> 3) The disk usage per copy is about 10G.
>
> In addition, I noticed that the work of zookeeper is normal and there
> is no error or warning message.
>
> So all these phenomena make me think that the internal specific
> mechanism of solr may lead to a sharp drop in the index construction
> speed. At present, it seems that our solr's machine resources are
sufficient.
>
> As for the reduction of the number of collections that you said, we
> also have this plan, and we are looking for ways to reform it. Are
> there any other suggestions?
>
>
> Best .
> miaohq
>
> 2018-03-11 10:15 GMT+08:00 spoonerk <john.spooner@gmail.com>:
>
> > Wow thanks. Just trying to unsubscribe. Most email lists let u do
> > that
> >
> > On Mar 10, 2018 2:36 PM, "Erick Erickson" <erickerickson@gmail.com>
> wrote:
> >
> > > Spoonerk:
> > >
> > > You say you've tried "many times", but you haven't provided full
> > > header as described in the "problems" link at the link below. You
> > > haven't e-mailed the list owner as suggested in the "problems" link.
> > > You haven't, in short, provided any of the information that's
> > > necessary to actually unsubscribe you.
> > >
> > > Please follow the instructions here:
> > > http://lucene.apache.org/solr/community.html#mailing-lists-irc. In
> > > particular look at the "problems" link.
> > >
> > > You must use the _exact_ same e-mail as you used to subscribe.
> > >
> > > If the initial try doesn't work and following the suggestions at
> > > the "problems" link doesn't work for you, let us know. But note
> > > you need to show us the _entire_ return header to allow anyone to
> > > diagnose the problem.
> > >
> > > Best,
> > > Erick
> > >
> > > On Sat, Mar 10, 2018 at 1:03 PM, spoonerk <john.spooner@gmail.com>
> > wrote:
> > > > I have manually unsubscribed many times. But I still get emails
> > > > from
> > the
> > > > list. Can some admin please unsubscribe me?
> > > >
> > > > On Mar 9, 2018 9:52 PM, "苗海泉" <mseaspring@gmail.com> wrote:
> > > >
> > > >> hello,We found a problem. In solr 6.0, the indexing speed of
> > > >> solr is influenced by the number of solr collections. The speed
> > > >> is normal
> > before
> > > >> the limit is reached. If the limit is reached, the indexing
> > > >> speed
> will
> > > >> decrease by 50 times.
> > > >>
> > > >> In our environment, there are 49 solr nodes. If each collection
> > > >> has
> 25
> > > >> shards, you can maintain high-speed indexing until the total
> > > >> number
> of
> > > >> collections is about 900. To reduce the number of collections
> > > >> to the
> > > limit,
> > > >> the speed will increase. Go up.
> > > >> If each collection is 49 shards, the total number of
> > > >> collections can
> > > only
> > > >> be about 700, exceeding this value will cause the index to drop
> > > >> dramatically.
> > > >> In the explanation, we are single copies, and multiple copies
> > > >> will
> > cause
> > > >> serious stability problems in the large solr cluster environment.
> > > >>
> > > >> At first I suspect that it was due to too many thread
> > > >> submissions,
> and
> > > >> there are still problems with this method, so I'm inclined to
> > > >> searcherExecutor thread pool thread. This is just my guess, I
> > > >> want
> to
> > > know
> > > >> the real reason. Can someone know if I can help?
> > > >>
> > > >> Also, I noticed that the searcherExecutor thread and solr
> collection's
> > > >> shards basically correspond to each other. How can I reduce the
> number
> > > of
> > > >> threads or even close it? Although there are many collections
> > > >> in our environment, there are few queries and it is not
> > > >> necessary to keep
> the
> > > >> threads open to provide queries. This is too wasteful.
> > > >>
> > > >> thank you .
> > > >>
> > >
> >
>
>
>
> --
> ==============================
> 联创科技
> 知行如一
> ==============================
>

FW: Resend: Authorization on 6.6.0

-----Original Message-----
From: Terry Steichen [mailto:terry@net-frame.com]
Sent: 13 March 2018 03:38
To: solr-user@lucene.apache.org
Subject: Resend: Authorization on 6.6.0

I'm resending the information below because the original message got the
security.json stuff garbled.
----------------------------------------------------------------------------
----

I'm using 6.6.0 with security.json active, having the content shown below.
I am running standalone mode, have two solr cores defined:
email1, and email2. Since the 'blockUnknown' is set to false, everyone
should have access to any unprotected resource. As you can see, I have
three users defined: joe, solr and terry (the latter two having an admin
role).

What I expect to happen is for user joe (who is not an admin) to be able to
access core emails2 without being challenged for his credentials. But, user
joe should also be challenged and not allowed to access emails1.

But solr appears to ignore the "collections" portion of the permission - it
denies joe access to both cores.

Is this a bug (in that auth doesn't work properly in 6.6.0 standalone), or
am I (once again) missing something?

Terry

{
    "authentication": {
        "class": "solr.BasicAuthPlugin",
        "blockUnknown": true,
        "credentials": {
            "solr": "IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
            "joe": "iGx0BaTgmjmCxrRmaD3IsCb2MJ21x1vqhfdzbwyu9MY=
P+aA0Bx811jzRwR97bOn/x/jyvpoKiHpWIRRXGAc8tg=",
            "terry": "q71fVfo/DIeCSfc1zw6YMyXVjU24Jr2oLniEkXFdPe0=
oSaEbu/0TCg8UehLQ9zfoH3AvrJBqCaIoJkt547WIrc="
        },
        "": {
            "v": 0
        }
    },
    "authorization": {
        "class": "solr.RuleBasedAuthorizationPlugin",
        "user-role": {
            "solr": "admin",
            "terry": "admin"
        },
        "permissions": [
            {
                "path": "/select",
                "role": "admin"
            }
        ]
    }
}

FW: Authorization in Solr 6.6.0 Not Working Properly

-----Original Message-----
From: Terry Steichen [mailto:terry@net-frame.com]
Sent: 13 March 2018 03:24
To: solr-user@lucene.apache.org
Subject: Authorization in Solr 6.6.0 Not Working Properly

I'm using 6.6.0 with security.json active, having the content shown below.
I am running standalone mode, have two solr cores defined:
email1, and email2. Since the 'blockUnknown' is set to false, everyone
should have access to any unprotected resource. As you can see, I have
three users defined: joe, solr and terry (the latter two having an admin
role).

What I expect to happen is for user joe (who is not an admin) to be able to
access core emails2 without being challenged for his credentials. But, user
joe should also be challenged and not allowed to access emails1.

But solr appears to ignore the "collections" portion of the permission - it
denies joe access to both cores.

Is this a bug (in that auth doesn't work properly in 6.6.0 standalone), or
am I (once again) missing something?

Terry

{     "authentication": {         "class": "solr.BasicAuthPlugin",
        "blockUnknown": false,         "credentials": {
"solr": "IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",             "joe":
"iGx0BaTgmjmCxrRmaD3IsCb2MJ21x1vqhfdzbwyu9MY=
P+aA0Bx811jzRwR97bOn/x/jyvpoKiHpWIRRXGAc8tg=",             "terry":
"q71fVfo/DIeCSfc1zw6YMyXVjU24Jr2oLniEkXFdPe0=
oSaEbu/0TCg8UehLQ9zfoH3AvrJBqCaIoJkt547WIrc="         },         "": {
            "v": 0         }     },     "authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",         "user-role": {
            "solr": "admin",             "terry": "admin"         },
        "permissions": [             { "collection":"emails1",
                "path": "/select",
                "role": "admin"             }         ]     } }

FW: CDCR performance issues

-----Original Message-----
From: Tom Peters [mailto:tpeters@synacor.com]
Sent: 13 March 2018 00:22
To: solr-user@lucene.apache.org
Subject: Re: CDCR performance issues

I'm also having issue with replicas in the target data center. It will go
from recovering to down. And when one of my replicas go to down in the
target data center, CDCR will no longer send updates from the source to the
target.

> On Mar 12, 2018, at 9:24 AM, Tom Peters <TPeters@synacor.com> wrote:
>
> Anyone have any thoughts on the questions I raised?
>
> I have another question related to CDCR:
> Sometimes we have to reindex a large chunk of our index (1M+ documents).
What's the best way to handle this if the normal CDCR process won't be able
to keep up? Manually trigger a bootstrap again? Or is there something else
we can do?
>
> Thanks.
>
>
>
>> On Mar 9, 2018, at 3:59 PM, Tom Peters <TPeters@synacor.com> wrote:
>>
>> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
requests to the target data center are not batched in any way. Each update
comes in as an independent update. Some follow-up questions:
>>
>> 1. Is it accurate that updates are not actually batched in transit from
the source to the target and instead each document is posted separately?
>>
>> 2. Are they done synchronously? I assume yes (since you wouldn't want
operations applied out of order)
>>
>> 3. If they are done synchronously, and are not batched in any way, does
that mean that the best performance I can expect would be roughly how long
it takes to round-trip a single document? ie. If my average ping is 25ms,
then I can expect a peak performance of roughly 40 ops/s.
>>
>> Thanks
>>
>>
>>
>>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C]
<daniel.davis@nih.gov> wrote:
>>>
>>> These are general guidelines, I've done loads of networking, but may be
less familiar with SolrCloud and CDCR architecture. However, I know it's
all TCP sockets, so general guidelines do apply.
>>>
>>> Check the round-trip time between the data centers using ping or TCP
ping. Throughput tests may be high, but if Solr has to wait for a response
to a request before sending the next action, then just like any network
protocol that does that, it will get slow.
>>>
>>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check
whether some proxy/load balancer between data centers is causing it to be a
single connection per operation. That will *kill* performance. Some
proxies default to HTTP/1.0 (open, send request, server send response,
close), and that will hurt.
>>>
>>> Why you should listen to me even without SolrCloud knowledge - checkout
paper "Latency performance of SOAP Implementations". Same distribution of
skills - I knew TCP well, but Apache Axis 1.1 not so well. I still
improved response time of Apache Axis 1.1 by 250ms per call with 1-line of
code.
>>>
>>> -----Original Message-----
>>> From: Tom Peters [mailto:tpeters@synacor.com]
>>> Sent: Wednesday, March 7, 2018 6:19 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: CDCR performance issues
>>>
>>> I'm having issues with the target collection staying up-to-date with
indexing from the source collection using CDCR.
>>>
>>> This is what I'm getting back in terms of OPS:
>>>
>>> curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>> {
>>> "responseHeader": {
>>> "status": 0,
>>> "QTime": 0
>>> },
>>> "operationsPerSecond": [
>>> "zook01,zook02,zook03/solr",
>>> [
>>> "mycollection",
>>> [
>>> "all",
>>> 49.10140553500938,
>>> "adds",
>>> 10.27612635309587,
>>> "deletes",
>>> 38.82527896994054
>>> ]
>>> ]
>>> ]
>>> }
>>>
>>> The source and target collections are in separate data centers.
>>>
>>> Doing a network test between the leader node in the source data center
and the ZooKeeper nodes in the target data center show decent enough network
performance: ~181 Mbit/s
>>>
>>> I've tried playing around with the "batchSize" value (128, 512, 728,
1000, 2000, 2500) and they've haven't made much of a difference.
>>>
>>> Any suggestions on potential settings to tune to improve the
performance?
>>>
>>> Thanks
>>>
>>> --
>>>
>>> Here's some relevant log lines from the source data center's leader:
>>>
>>> 2018-03-07 23:16:11.984 INFO
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>> 2018-03-07 23:16:23.062 INFO
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>>> 2018-03-07 23:16:32.063 INFO
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>> 2018-03-07 23:16:36.209 INFO
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>>> 2018-03-07 23:16:42.091 INFO
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>>> 2018-03-07 23:16:46.790 INFO
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>> 2018-03-07 23:16:50.004 INFO
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>>>
>>>
>>> And what the log looks like in the target:
>>>
>>> 2018-03-07 23:18:46.475 INFO (qtp1595212853-26) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067896487950&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.500 INFO (qtp1595212853-25) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067896487951&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.525 INFO (qtp1595212853-24) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536512&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.550 INFO (qtp1595212853-3793) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536513&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.575 INFO (qtp1595212853-30) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536514&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.600 INFO (qtp1595212853-26) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536515&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.625 INFO (qtp1595212853-25) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536516&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.651 INFO (qtp1595212853-24) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536517&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.676 INFO (qtp1595212853-3793) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536518&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>> 2018-03-07 23:18:46.701 INFO (qtp1595212853-30) [c:mycollection
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1] webapp=/solr path=/update
params={_stateVer_=mycollection:30&_version_=-1594317067897536519&cdcr.updat
e=&wt=javabin&version=2} status=0 QTime=0
>>>
>>>
>>>
>>> This message and any attachment may contain information that is
confidential and/or proprietary. Any use, disclosure, copying, storing, or
distribution of this e-mail or any attached file by anyone other than the
intended recipient is strictly prohibited. If you have received this message
in error, please notify the sender by reply email and delete the message and
any attachments. Thank you.
>>
>>
>>
>> This message and any attachment may contain information that is
confidential and/or proprietary. Any use, disclosure, copying, storing, or
distribution of this e-mail or any attached file by anyone other than the
intended recipient is strictly prohibited. If you have received this message
in error, please notify the sender by reply email and delete the message and
any attachments. Thank you.
>
>
>
> This message and any attachment may contain information that is
confidential and/or proprietary. Any use, disclosure, copying, storing, or
distribution of this e-mail or any attached file by anyone other than the
intended recipient is strictly prohibited. If you have received this message
in error, please notify the sender by reply email and delete the message and
any attachments. Thank you.

This message and any attachment may contain information that is confidential
and/or proprietary. Any use, disclosure, copying, storing, or distribution
of this e-mail or any attached file by anyone other than the intended
recipient is strictly prohibited. If you have received this message in
error, please notify the sender by reply email and delete the message and
any attachments. Thank you.

FW: SpellCheck Reload

-----Original Message-----
From: Sadiki Latty [mailto:slatty@uottawa.ca]
Sent: 12 March 2018 23:08
To: solr-user@lucene.apache.org
Subject: SpellCheck Reload

Greetings list,

I had question regarding the spellcheck.reload parameter. I am using the
IndexBasedSpellChecker which creates it's dictionary based on content from a
field. I built the spell check (in error) with a field that has stemming and
other filters associated to it.

Regarding the spellcheck.reload parameter, the guide states "If set to true,
this parameter reloads the spellchecker. The results depend on the
implementation of SolrSpellChecker.reload(). In a typical implementation,
reloading the spellchecker means reloading the dictionary."

My question is, does "reloading the dictionary" mean completely erasing the
current dictionary and starting from scratch (which is what I want), or does
it simply reload the dictionary into some form of memory which would include
what was there before (the stemmed and filtered data based on the initial
field)?

Thanks,

Sid

FW: Defining a phonetic analyzer and searcher via the schema API

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 12 March 2018 23:05
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Defining a phonetic analyzer and searcher via the schema API

Chris:

LGTM, except maybe ;).....

You'll want to look closely at your admin UI/Analysis page for the field (or
fieldType) once it's defined. Uncheck the "verbose" box when you look the
first time, it'll be less confusing. That'll show you _exactly_ what the
results are and whether they match your expectations. "right" is such an
existential question after all...

When you're using that page, think outside the box. For instance, I can't
say offhand whether the phonetic filter you chose gives different results
when words are capitalized or not. what about when they have numbers? Put
some punctuation in. Try an e-mail address.
Etc. etc. etc.

For instance. If you swap out StandardTokenizer for WhitespaceTokenizer,
you'll now have punctuation in the mix. Most people don't notice if they
have WordDelimiterGraphFilterFactory in the analysis chain too....

bq: Actually, I have the script that builds the schema in VCS, so it's
roughly the same.

We're on the same page here. I don't particularly care how the schema gets
saved, as long as I can back up to the last known good schema and start
over....

I'll mention in passing that there's no problem whatsoever with using the
"classic" schema. The managed stuff is cool, and enables spiffy front-ends
etc. Personally I'm comfortable enough with hand-editing the schemas that I
find it faster so I usually use it.

BTW, bin/solr has a set of commands that allow you to move upload/download
configs, try "bin/solr zk -help".....

Walter:

"I don't usually test my code, but when I do it's in production".

These young whipper-snappers don't appreciate how _very_ many ways things
can go wrong ;)

My tongue-in-cheek way to distinguish novice from "veteran" programmers:

Novice: The code compiles and she's surprised when it doesn't work the first
time.

Veteran: The code ran perfectly the first time. She immediately goes over it
with a fine-tooth comb to see whether it's still running canned test cases.

Best,
Erick

On Mon, Mar 12, 2018 at 10:14 AM, Christopher Schultz
<chris@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Erick,
>
> On 3/12/18 1:00 PM, Erick Erickson wrote:
>> bq: which you aren't supposed to edit directly.
>>
>> Well, kind of. Here's why it's "discouraged":
>> https://lucene.apache.org/solr/guide/6_6/schema-api.html.
>>
>> But as long as you don't mix-and-match hand-editing with using the
>> schema API you can hand edit it freely. You're then in charge of
>> pushing it to ZK and reloading your collections that use it yourself
>> however.
>
> No Zookeeper (yet), but I suspect I'll end up there. I'm mostly
> toying-around with it right now, but it won't be long before I'll want
> to go live with it and having a single Solr instance isn't going to
> help me sleep well at night. I'm sure I'll end up with two instances
> to begin with, which requires ZK, right?
>
>> As a side note, even if I _never_ hand-edited it I'd make it a
>> practice to regularly pull it from ZK and put it in some VCS system
>> ;)
>
> Actually, I have the script that builds the schema in VCS, so it's
> roughly the same.
>
> As for the schema modifications... did I get those right?
>
> Thanks,
> - -chris
>
>> On Mon, Mar 12, 2018 at 9:51 AM, Christopher Schultz
>> <chris@christopherschultz.net> wrote: All,
>>
>> I'd like to add a new synthesized field that uses a phonetic analyzer
>> such as Beider-Morse. I'm using Solr 7.2.
>>
>> When I request the current schema via the schema API, I get a list of
>> existing fields, dynamic fields, and analyzers, none of which appear
>> to be what I'm looking for.
>>
>> Conceptually, I think I'd like to do something like this:
>>
>> add-field: { name: phoneticname, type: phonetic, multiValued: true }
>>
>> ... but how do I define what type of data "phonetic" should be?
>>
>> I can see the example XML definition in this document:
>> https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Fil
>> t
> er
>>
>>
> Descriptions-Beider-MorseFilter
>>
>> But I'm not sure how to add an analyzer to the schema using the
>> schema API:
>> https://lucene.apache.org/solr/guide/7_2/schema-api.html
>>
>> Under "Add a new field type", it says that new analyzers can be
>> defined, but I'm not entirely sure how to do that ... the API docs
>> refer to the field type definitions page[1] which just shows what XML
>> you'd have to put into your schema XML -- which you aren't supposed
>> to edit directly.
>>
>> When looking at the JSON version of my schema, I can see for example
>> thi s:
>>
>> "fieldTypes":[{ "name":"ancestor_path", "class":"solr.TextField",
>> "indexAnalyzer":{ "tokenizer":{
>> "class":"solr.KeywordTokenizerFactory"}}, "queryAnalyzer":{
>> "tokenizer":{ "class":"solr.PathHierarchyTokenizerFactory",
>> "delimiter":"/"}}},
>>
>> So should I create a new field type like this?
>>
>> "add-field-type" : { "name" : "phonetic", "class" :
>> "solr.TextField",
>>
>> "analyzer" : { "tokenizer": { "class" :
>> "solr.StandardTokenizerFactory" },
>>
>> "filters" : [{ "class": "solr.BeiderMorseFilterFactory",
>> "nameType": "GENERIC", "ruleType": "APPROX", "concat": "true",
>> "languageSet": "auto" }] } }
>>
>> Then, use copy-field as "usual":
>>
>> "add-field":{ "name":"phonetic", "type":"phonetic", multiValued:
>> true, "stored":false },
>>
>> "add-copy-field":{ "source":"first_name", "dest":"phonetic" },
>>
>> "add-copy-field":{ "source":"last_name", "dest":"phonetic" },
>>
>> This seems to work but I wanted to know if I was doing it the right
>> way.
>>
>> Thanks, -chris
>>
>> [1]
>> https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-p
>> r
> op
>>
>>
> erties.html#field-type-definitions-and-properties
>>
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtY4dHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhdIA/9GkZ/yimVmkwB725L
> uS4kcy4YJowyYw+eMtvurpIq/ZV/U8H4hFJY/ddsT+bdrjeZMsTdc7B9Tdlha8xt
> dmuj1VcvDn3uyIUGooTOob6ZvZwjeJEZIJrbwUM5gNq7uJW8xpCU0/3+iP6Km7OY
> 1Nia5uCuwarLWcsRFdtjCvR3M7ZppBYHec3kVGGOUL637AC6ISgpxhuzOnuTHAss
> wCjuR1y6AdTjRbHpis3MJdiVIjEENfyzGpEnqvumsu1e+0F/A0DNbhU9nAPv+73d
> aOLfOW9Fs6jjnq96qzIBAkHLWkqU1GHKYNYHql7/59x8rFcjGkGC7ziSY69lKc+f
> ivrIEqLH1Go7kawz+1og3dPyl/n0CFWE3UK+wj5QeTY5XLduq0x6EmFKW6D790BS
> ywmFuqr4cmvKbs3N6BbxHz5QVbjgRsWO4jp4kJi3KDCepd8vKW+2xwHfX/zAcBKY
> rSDuVkM3KtxQal8xgm4tsvyU3g1dXpNEVa7PFXYJzd3uA2yij9OU6s83NS9LHK3N
> 2zssPfNDj7QddAEhYan0O4r4wSUN2UNT9nMhBVXXYRpoD6WzrhC5TdRUDh66rkOB
> AvhAUKsV0rfjct+MUBpQA9W+SUG7i911wNSBJJmB58MYbyxMAJb8NKGk1yEs1MyH
> FQHEgiEEFRCD9ZFd/fqwfuPyKQo=
> =Vqz6
> -----END PGP SIGNATURE-----

FW: Including a filtered-field in the default-field

-----Original Message-----
From: Christopher Schultz [mailto:chris@christopherschultz.net]
Sent: 12 March 2018 22:51
To: solr-user@lucene.apache.org
Subject: Including a filtered-field in the default-field

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

I have a Solr index containing application user information (username,
first/last, etc.). I have created an "all" field for the purpose of using it
as a default. It contains most but not all fields.

I recently added phonetic searching for the first and last names (together
in a single field) but it will only work if the query specifies that field
like this:

chris or phonetic:schultz

Is there a way to add the phonetic field to the "all" field and have it
searched phonetically alongside the non-phonetic fields/terms? I see I
cannot have multiple "default fields" :)

I know on the back-end I can construct a query like this:

all:[query] phonetic:[query]

...but I'd prefer to do as little massaging of the query as possible.

Thanks,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmtt8dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFggzQ//SRglgqRRvh9GgwCk
A2xAjiMZSnf6iYsVJDMPlP1IhlLVrgPH5gm3tKWbOHdf52s3RPaWXHlDE3PAuHtx
gpDPOief+mo4X/XmlWnj8R461XahOAhEDKeJ6uIS4X2qR2hZvSQd+gXMN4/aoA/l
BcRNSTiQGKUVDDj+3wZayFhjElrrfaDbWC2dwnM2ULMrgK7xyhnulCVjUV+hOswY
AmMyTuDJNjPuiT867x8Cckoh8J468OkBtQUUkdHn9UiHwShD8TxaDSUVcpqyxihr
oODmvLfPN6EIkv0CN4h/pNrRCvQBlNTSeIh2AqFk7rnD4W0nWSzRXcV5seyGushI
pzPvebtIsYcxx+DU7d/4jqH42yba9fADPFa+xhHckbwY4e2lZxvKF7HpLu6aoVnH
zhCl/Cdu3cwZPmDWsUX+3Xkb7r28pe1iUrdNoYrbrhfL2WR6xJX5hL32QQyoyy8V
w/SU8XLgETYSe0oN773F/Lxjwf2AVATdKY9acrsKr+KSI5VFEFBHNJctkpk0o630
OOI1FYencsbiCdIVoRQ5b94EU/iAqs7r8wi6OdyeawyPOFDZhFwFMeYUPasuWrCP
MEB0iSbCI9OIG5pi6tiTtbOQQ85Qb41u2VcyOqKHkneEMn58/nx2QY0FiS3XvcC7
HuZC0A7VssLRu2g5+joWp4NBILI=
=yyJm
-----END PGP SIGNATURE-----

FW: Defining a phonetic analyzer and searcher via the schema API

-----Original Message-----
From: Christopher Schultz [mailto:chris@christopherschultz.net]
Sent: 12 March 2018 22:22
To: solr-user@lucene.apache.org
Subject: Defining a phonetic analyzer and searcher via the schema API

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

I'd like to add a new synthesized field that uses a phonetic analyzer such
as Beider-Morse. I'm using Solr 7.2.

When I request the current schema via the schema API, I get a list of
existing fields, dynamic fields, and analyzers, none of which appear to be
what I'm looking for.

Conceptually, I think I'd like to do something like this:

add-field: { name: phoneticname, type: phonetic, multiValued: true }

... but how do I define what type of data "phonetic" should be?

I can see the example XML definition in this document:
https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filter
Descriptions-Beider-MorseFilter

But I'm not sure how to add an analyzer to the schema using the schema
API: https://lucene.apache.org/solr/guide/7_2/schema-api.html

Under "Add a new field type", it says that new analyzers can be defined, but
I'm not entirely sure how to do that ... the API docs refer to the field
type definitions page[1] which just shows what XML you'd have to put into
your schema XML -- which you aren't supposed to edit directly.

When looking at the JSON version of my schema, I can see for example thi
s:

"fieldTypes":[{
"name":"ancestor_path",
"class":"solr.TextField",
"indexAnalyzer":{
"tokenizer":{
"class":"solr.KeywordTokenizerFactory"}},
"queryAnalyzer":{
"tokenizer":{
"class":"solr.PathHierarchyTokenizerFactory",
"delimiter":"/"}}},

So should I create a new field type like this?

"add-field-type" : {
"name" : "phonetic",
"class" : "solr.TextField",

"analyzer" : {
"tokenizer": { "class" : "solr.StandardTokenizerFactory" },

"filters" : [{
"class": "solr.BeiderMorseFilterFactory",
"nameType": "GENERIC",
"ruleType": "APPROX",
"concat": "true",
"languageSet": "auto"
}]
}
}

Then, use copy-field as "usual":

"add-field":{
"name":"phonetic",
"type":"phonetic",
multiValued: true,
"stored":false },

"add-copy-field":{
"source":"first_name",
"dest":"phonetic" },

"add-copy-field":{
"source":"last_name",
"dest":"phonetic" },

This seems to work but I wanted to know if I was doing it the right way.

Thanks,
- -chris

[1]
https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-prop
erties.html#field-type-definitions-and-properties
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmsC4dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjZWRAAisee5Ya+5dyix91A
cGpwgZtFpcVldhd0wDG8qwihq9528vBZCdDSM3yotojMd+Y9dYLm+Q+oM/RT/zoO
IXVfRRc352GqG00++hYKpZONUp9Eb3RNjl64+TCufz7vSpr3U/TsJL4wwIMQAY3r
eItN/v6TWvvb6jd0z/zL1eITeheOm7bFGjZhGRNv2A7LaQbqTLs6N+SgYphUv7mr
E6oQZD5VsdNDqmQdpXVA+Z+eiHweST5JHm1T2ePPz2S7lYunmAcGkAhCmTn2Kwew
H3C8+h+mD14YlfYK5J0VcQ2WMZtOkgNNvBiUGIUoEGoqu82dX81408cS49/ZYD/3
c9/p41nfzz2V9M3HwgYqbQTI9vV5HP33t44BsWIQr34x86yAPfnMIH3Yv5iEfXTk
aGAyeQjkfmMfJbiKTtmVu8Z7q/AiacgzUFUh3yMzGnoDQKz/OWw0A3JkdJ0TT/vY
Y6ZiwarooO1tuhG+wm4h+6rUQpoueJS7K8cdWi7LfVb9LGLgj7NCaOQtyIn9QAmk
1UxaJjIOiyO1hsV31nC0kXfKW2A/gkN444gitSi51106QuzIXpEtCeAc4QmqjJt9
yeI61DFbQRnr76oVCiyYQwEmOj+C0bOkZqkLU7ZvMonWLLjgX0ydrpNSfm0fDDNv
tdfbE/POTM+uJlgX0UEEJhN7qz0=
=bgGi
-----END PGP SIGNATURE-----

FW: LTR Model size

-----Original Message-----
From: Roopa Rao [mailto:roopaml@gmail.com]
Sent: 12 March 2018 20:36
To: solr-user@lucene.apache.org
Subject: Re: LTR Model size

What would be the best way to patch this to Solr 6.6 without having to do a
full upgrade

Thanks,
Roopa

On Fri, Mar 9, 2018 at 4:55 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> Spoonerk:
>
> Please follow the instructions here:
> http://lucene.apache.org/solr/community.html#mailing-lists-irc
>
> . You must use the _exact_ same e-mail as you used to subscribe.
>
> If the initial try doesn't work and following the suggestions at the
> "problems" link doesn't work for you, let us know. But note you need
> to show us the _entire_ return header to allow anyone to diagnose the
> problem.
>
>
> Best,
> Erick
>
> On Fri, Mar 9, 2018 at 12:15 PM, spoonerk <john.spooner@gmail.com> wrote:
> > Please unsubscribe me. I have tried and tried but still get emails
> >
> > On Mar 9, 2018 10:19 AM, "Roopa Rao" <roopaml@gmail.com> wrote:
> >
> >> what is the way to configure the model size for LTR? We have about
> >> 3MB model and Solr is not holding this model as a ManagedResource.
> >>
> >> How can this be configured?
> >>
> >> Thanks,
> >> Roopa
> >>
>

FW: Including a filtered-field in the default-field

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 13 March 2018 07:21
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Including a filtered-field in the default-field

bq: Looks like the "qf=all phonetic" would take the place of my existing
"df=all" parameter.

In fact, it may call int question whether you even want an "all" field or
just list all the fields you _would_ have copied into "all" in the "qf"
parameter.
Having a single field
to search is certainly more efficient, at the expense of a larger index. As
always "it depends" on the nature of your index, the response speed you
require etc. etc.
The advantage of just putting a bunch of fields in the "qf" parameter is
that you can weight each one individually. Whether it's worth it or not in
your situation is "an exercise for the reader" ;)

bq: Cool, like "if I spell it exactly right, I want that result to float to
the top"?

Exactly. It's a tradeoff though. My name is a perfect example. If someone
searches for "erik eriksson" _I_ want to be listed at the top ;)

bq: Since I'm taking the query from the user in my backend Java to convert
it into a Solr call, I'm comfortable doing everything in the Java code
itself.

Again, another tradeoff, completely up to you. Mostly I've found it depends
on which is administratively easier in your particular environment.

1> pushing the configs to ZooKeeper (bin/solr zk upconfig......)
2> reloading the collection (Collections API call)

.vs.

1> insuring my new jar gets to all the places it should be, which may
be only a single spot.
2> restarting (perhaps) the back-end.

Best,
Erick

On Mon, Mar 12, 2018 at 3:32 PM, Christopher Schultz
<chris@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Erick,
>
> (Sorry... hit sent inadvertently before completion...)
>
> On 3/12/18 2:50 PM, Erick Erickson wrote:
>> Something like:
>>
>> ....solr/collection/query?q=chris shultz&defType=edismax&qf=all^10
>> phonetic
>
> Interesting. Looks like the "qf=all phonetic" would take the place of
> my existing "df=all" parameter.
>
>> The point of edismax is to take whatever the input is and distribute
>> it among one or more fields defined by the "qf"
>> parameter.
>
> That's an entirely lucid explanation. That's not evident from reading
> the official documentation :)
>
>> In this case, it'll look for "chris" and "shultz" in both the "all"
>> and "phonetic" fields. It would boost matches in the "all"
>> field by 10, giving you an easy knob to tweak for "this field is more
>> important than this other one".
>
> Cool, like "if I spell it exactly right, I want that result to float
> to the top"?
>
>> You can combine "fielded" searches, something like:
>> ....solr/collection/query?q=firstName:chris
>> shultz&defType=edismax&qf=all phonetic
>>
>> would search for "shultz" in the "all" and "phonetic" fields while
>> searching for "chris" only in the "firstName" field.
>
> Perfect.
>
>> As you have noticed, there are a _lot_ of knobs to tweak when it
>> comes to edismax, and the result of adding &debug=query to the URL
>> can be...bewildering. But edismax was created exactly to spread the
>> input out across multiple fields automatically.
>>
>> You can also put these as defaults in your requesthandler in
>> solrconfig.xml. The "browse" handler in some of the examples will
>> give you a template, I'd copy/paste from the "browse" handler to you
>> main handler (usually "selsect"), as the "browse" handler is tied
>> into the Velocity templating engine....
>
> Since I'm taking the query from the user in my backend Java to convert
> it into a Solr call, I'm comfortable doing everything in the Java code
> itself. I'd actually rather not have too much automated stuff, because
> then I think I'll confuse myself when using the Solr dashboard for
> debugging, etc.
>
>> To start, since there are a lot of parameters to tweak, I'd just
>> start with the "qf" field (plus some boosts perhaps). Then move on to
>> pf, pf2, pf3. mm will take a while to get your head around all by
>> itself. I think once you see the basic operation, then the rest of
>> the parameters will be easier to understand.
>>
>> And I urge you to take it a little at a time, just use two fields and
>> two terms and look at the result of &debug=query, the parsed query
>> bits, 'cause each new thing you add adds a further complication.
>> Fortunately you can just put different parameters on the URL and see
>> the results for rapidly iterating.
>
> Exactly :)
>
> Thanks for the hints,
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqm//AdHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhn2Q/+KgmjtAbKbak3qSB9
> eHqNz58HS1TQ5XAosMw5WvWikqPcSH+rWVyOQfk+UPNNnI/lsK9dt1Tqpg3LPSHd
> cdJFEweoWQWhqWkj5lYj+/cJHcuS2Bd4TP3wOuAIdm7heP3iHVsjfRS7YodRVGCn
> JRbmiJBmtSlw1K+leMf4IF4kkBCzDEuZU/LcKfzyU3VoNORwtGYGHq9EXxaDtFyh
> 0v8v8PJWGHXgAKxdCf9a1qK9Jb40mTciGIhEQ1V083sN4U/Dieq+u9/VCVTzqlwC
> KuZ9YWSA58Pqx3biJYwNrjJJITFRFZT4C/TNKeiDENe53n3fL+HsSAhxs2RDvLO0
> qK3NXN75B32gLZi7n/+s0SCqQcJeV/HlomLjHeB+0bUTi9Mwwqng7qoaJ49FIdjq
> N4lgjVLJMZmp87m883PlLev0ZXrTuoX/QRj4a5xh7tENfQ3StoUz0cC0D8GDO+XO
> WERL5p98KZtfca95SHAQSK41H74O5AbfG/h85iZitRQaM4mYt/cs5DAdGif9T4+z
> ZDzKgk1kutsTKDRyFZM6qK1O/K+9mk8ye6op+RGCYRr5qbJZpgwgUO8Vl+kOgLS7
> WljUkmLbOGsGo8a2pJNJ481OhD3e+C5pa+SFGaxtYT7GBiuGJ/y8LA4HqtXzd+k3
> wiHOJ0Bixyo1T4aEjbGZ+tFTOTM=
> =ehg4
> -----END PGP SIGNATURE-----

Monday, March 19, 2018

FW: Securying ONLY the web interface console

FW: Question liste solr

FW: How does group.query work in solr?

FW: [PHP Classes] Notable PHP package: PHP Screenshot URL Handler

ROSHAN, a PHP package is considered Notable when it does something different that is worth noting.

Package

Moderator comment

Author

Groups

Description

FW: Error when indexing with SolrJ HTTP ERROR 405

Sunday, March 18, 2018

FW: Indexing multi level Nested JSON

FW: [aadhaarauth] Auth Url not working

FW: collection reload leads to OutOfMemoryError

FW: Looking for design ideas

Saturday, March 17, 2018

FW: Replication in Master Slave Solr setup

FW: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

FW: Resend: Authorization on 6.6.0

FW: Authorization in Solr 6.6.0 Not Working Properly

FW: CDCR performance issues

FW: SpellCheck Reload

FW: Defining a phonetic analyzer and searcher via the schema API

FW: Including a filtered-field in the default-field

FW: Defining a phonetic analyzer and searcher via the schema API

FW: LTR Model size

FW: Including a filtered-field in the default-field

Blog Archive

About Me