Saturday, March 17, 2018

FW: Including a filtered-field in the default-field

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 13 March 2018 07:21
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Including a filtered-field in the default-field

bq: Looks like the "qf=all phonetic" would take the place of my existing
"df=all" parameter.

In fact, it may call int question whether you even want an "all" field or
just list all the fields you _would_ have copied into "all" in the "qf"
parameter.
Having a single field
to search is certainly more efficient, at the expense of a larger index. As
always "it depends" on the nature of your index, the response speed you
require etc. etc.
The advantage of just putting a bunch of fields in the "qf" parameter is
that you can weight each one individually. Whether it's worth it or not in
your situation is "an exercise for the reader" ;)


bq: Cool, like "if I spell it exactly right, I want that result to float to
the top"?

Exactly. It's a tradeoff though. My name is a perfect example. If someone
searches for "erik eriksson" _I_ want to be listed at the top ;)

bq: Since I'm taking the query from the user in my backend Java to convert
it into a Solr call, I'm comfortable doing everything in the Java code
itself.

Again, another tradeoff, completely up to you. Mostly I've found it depends
on which is administratively easier in your particular environment.

1> pushing the configs to ZooKeeper (bin/solr zk upconfig......)
2> reloading the collection (Collections API call)

.vs.

1> insuring my new jar gets to all the places it should be, which may
be only a single spot.
2> restarting (perhaps) the back-end.

Best,
Erick

On Mon, Mar 12, 2018 at 3:32 PM, Christopher Schultz
<chris@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Erick,
>
> (Sorry... hit sent inadvertently before completion...)
>
> On 3/12/18 2:50 PM, Erick Erickson wrote:
>> Something like:
>>
>> ....solr/collection/query?q=chris shultz&defType=edismax&qf=all^10
>> phonetic
>
> Interesting. Looks like the "qf=all phonetic" would take the place of
> my existing "df=all" parameter.
>
>> The point of edismax is to take whatever the input is and distribute
>> it among one or more fields defined by the "qf"
>> parameter.
>
> That's an entirely lucid explanation. That's not evident from reading
> the official documentation :)
>
>> In this case, it'll look for "chris" and "shultz" in both the "all"
>> and "phonetic" fields. It would boost matches in the "all"
>> field by 10, giving you an easy knob to tweak for "this field is more
>> important than this other one".
>
> Cool, like "if I spell it exactly right, I want that result to float
> to the top"?
>
>> You can combine "fielded" searches, something like:
>> ....solr/collection/query?q=firstName:chris
>> shultz&defType=edismax&qf=all phonetic
>>
>> would search for "shultz" in the "all" and "phonetic" fields while
>> searching for "chris" only in the "firstName" field.
>
> Perfect.
>
>> As you have noticed, there are a _lot_ of knobs to tweak when it
>> comes to edismax, and the result of adding &debug=query to the URL
>> can be...bewildering. But edismax was created exactly to spread the
>> input out across multiple fields automatically.
>>
>> You can also put these as defaults in your requesthandler in
>> solrconfig.xml. The "browse" handler in some of the examples will
>> give you a template, I'd copy/paste from the "browse" handler to you
>> main handler (usually "selsect"), as the "browse" handler is tied
>> into the Velocity templating engine....
>
> Since I'm taking the query from the user in my backend Java to convert
> it into a Solr call, I'm comfortable doing everything in the Java code
> itself. I'd actually rather not have too much automated stuff, because
> then I think I'll confuse myself when using the Solr dashboard for
> debugging, etc.
>
>> To start, since there are a lot of parameters to tweak, I'd just
>> start with the "qf" field (plus some boosts perhaps). Then move on to
>> pf, pf2, pf3. mm will take a while to get your head around all by
>> itself. I think once you see the basic operation, then the rest of
>> the parameters will be easier to understand.
>>
>> And I urge you to take it a little at a time, just use two fields and
>> two terms and look at the result of &debug=query, the parsed query
>> bits, 'cause each new thing you add adds a further complication.
>> Fortunately you can just put different parameters on the URL and see
>> the results for rapidly iterating.
>
> Exactly :)
>
> Thanks for the hints,
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqm//AdHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhn2Q/+KgmjtAbKbak3qSB9
> eHqNz58HS1TQ5XAosMw5WvWikqPcSH+rWVyOQfk+UPNNnI/lsK9dt1Tqpg3LPSHd
> cdJFEweoWQWhqWkj5lYj+/cJHcuS2Bd4TP3wOuAIdm7heP3iHVsjfRS7YodRVGCn
> JRbmiJBmtSlw1K+leMf4IF4kkBCzDEuZU/LcKfzyU3VoNORwtGYGHq9EXxaDtFyh
> 0v8v8PJWGHXgAKxdCf9a1qK9Jb40mTciGIhEQ1V083sN4U/Dieq+u9/VCVTzqlwC
> KuZ9YWSA58Pqx3biJYwNrjJJITFRFZT4C/TNKeiDENe53n3fL+HsSAhxs2RDvLO0
> qK3NXN75B32gLZi7n/+s0SCqQcJeV/HlomLjHeB+0bUTi9Mwwqng7qoaJ49FIdjq
> N4lgjVLJMZmp87m883PlLev0ZXrTuoX/QRj4a5xh7tENfQ3StoUz0cC0D8GDO+XO
> WERL5p98KZtfca95SHAQSK41H74O5AbfG/h85iZitRQaM4mYt/cs5DAdGif9T4+z
> ZDzKgk1kutsTKDRyFZM6qK1O/K+9mk8ye6op+RGCYRr5qbJZpgwgUO8Vl+kOgLS7
> WljUkmLbOGsGo8a2pJNJ481OhD3e+C5pa+SFGaxtYT7GBiuGJ/y8LA4HqtXzd+k3
> wiHOJ0Bixyo1T4aEjbGZ+tFTOTM=
> =ehg4
> -----END PGP SIGNATURE-----

No comments:

Post a Comment