Saturday, March 17, 2018

FW: Defining a phonetic analyzer and searcher via the schema API

-----Original Message-----
From: Christopher Schultz [mailto:chris@christopherschultz.net]
Sent: 12 March 2018 22:22
To: solr-user@lucene.apache.org
Subject: Defining a phonetic analyzer and searcher via the schema API

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

I'd like to add a new synthesized field that uses a phonetic analyzer such
as Beider-Morse. I'm using Solr 7.2.

When I request the current schema via the schema API, I get a list of
existing fields, dynamic fields, and analyzers, none of which appear to be
what I'm looking for.

Conceptually, I think I'd like to do something like this:

add-field: { name: phoneticname, type: phonetic, multiValued: true }

... but how do I define what type of data "phonetic" should be?

I can see the example XML definition in this document:
https://lucene.apache.org/solr/guide/7_2/filter-descriptions.html#Filter
Descriptions-Beider-MorseFilter


But I'm not sure how to add an analyzer to the schema using the schema
API: https://lucene.apache.org/solr/guide/7_2/schema-api.html

Under "Add a new field type", it says that new analyzers can be defined, but
I'm not entirely sure how to do that ... the API docs refer to the field
type definitions page[1] which just shows what XML you'd have to put into
your schema XML -- which you aren't supposed to edit directly.

When looking at the JSON version of my schema, I can see for example thi
s:

"fieldTypes":[{
"name":"ancestor_path",
"class":"solr.TextField",
"indexAnalyzer":{
"tokenizer":{
"class":"solr.KeywordTokenizerFactory"}},
"queryAnalyzer":{
"tokenizer":{
"class":"solr.PathHierarchyTokenizerFactory",
"delimiter":"/"}}},

So should I create a new field type like this?

"add-field-type" : {
"name" : "phonetic",
"class" : "solr.TextField",

"analyzer" : {
"tokenizer": { "class" : "solr.StandardTokenizerFactory" },

"filters" : [{
"class": "solr.BeiderMorseFilterFactory",
"nameType": "GENERIC",
"ruleType": "APPROX",
"concat": "true",
"languageSet": "auto"
}]
}
}

Then, use copy-field as "usual":

"add-field":{
"name":"phonetic",
"type":"phonetic",
multiValued: true,
"stored":false },

"add-copy-field":{
"source":"first_name",
"dest":"phonetic" },

"add-copy-field":{
"source":"last_name",
"dest":"phonetic" },

This seems to work but I wanted to know if I was doing it the right way.

Thanks,
- -chris

[1]
https://lucene.apache.org/solr/guide/7_2/field-type-definitions-and-prop
erties.html#field-type-definitions-and-properties

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqmsC4dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjZWRAAisee5Ya+5dyix91A
cGpwgZtFpcVldhd0wDG8qwihq9528vBZCdDSM3yotojMd+Y9dYLm+Q+oM/RT/zoO
IXVfRRc352GqG00++hYKpZONUp9Eb3RNjl64+TCufz7vSpr3U/TsJL4wwIMQAY3r
eItN/v6TWvvb6jd0z/zL1eITeheOm7bFGjZhGRNv2A7LaQbqTLs6N+SgYphUv7mr
E6oQZD5VsdNDqmQdpXVA+Z+eiHweST5JHm1T2ePPz2S7lYunmAcGkAhCmTn2Kwew
H3C8+h+mD14YlfYK5J0VcQ2WMZtOkgNNvBiUGIUoEGoqu82dX81408cS49/ZYD/3
c9/p41nfzz2V9M3HwgYqbQTI9vV5HP33t44BsWIQr34x86yAPfnMIH3Yv5iEfXTk
aGAyeQjkfmMfJbiKTtmVu8Z7q/AiacgzUFUh3yMzGnoDQKz/OWw0A3JkdJ0TT/vY
Y6ZiwarooO1tuhG+wm4h+6rUQpoueJS7K8cdWi7LfVb9LGLgj7NCaOQtyIn9QAmk
1UxaJjIOiyO1hsV31nC0kXfKW2A/gkN444gitSi51106QuzIXpEtCeAc4QmqjJt9
yeI61DFbQRnr76oVCiyYQwEmOj+C0bOkZqkLU7ZvMonWLLjgX0ydrpNSfm0fDDNv
tdfbE/POTM+uJlgX0UEEJhN7qz0=
=bgGi
-----END PGP SIGNATURE-----

No comments:

Post a Comment