Saturday, March 17, 2018

FW: question regarding wildcard-searches

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 16 March 2018 21:37
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: question regarding wildcard-searches

If you goal is to search prefixes only, I'd go away from the _text_ field
all together and use a "string" type. This will mean you need to
1> make it multiValued=true
2> split this up (either on your client or use a
FieldMutatingUpdateProcessor, probably RegexReplaceProcessorFactory) into
separate entries, i.e.
'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3'
becomes three separate entries in the field 'EO.1954.53.1'
'EO.1954.53.2'
'EO.1954.53.3'

At that point, searches like: 'EO.1954.53.*'

will work just fine. NOTE: String types do zero analysis, so you have to
handle things like casing yourself. That is, 'eO.1954.53.*' would _not_
match. You can probably use something like KeywordTokenizerFactory +
LowerCaseFilterFactory in that case.

All this makes _much_ more sense if you use the admin UI>>analysis page
(probably uncheck the "verbose" checkbox, there'll be less clutter").

Best,
Erick

On Fri, Mar 16, 2018 at 8:35 AM, Emir Arnautović
<emir.arnautovic@sematext.com> wrote:
> Hi Roel,
> As mentioned, _text_ field probably does not contain complete
"EO.1954.53.1" but only its parts. You can verify that using snalysis screen
in admin console. What you can try is searching for phrase without wildcard
"EO.1954.53" or if you are using WordDelimiterTokenFilter in your analysis
chain, you can set preserveOriginal="1" and reindex.
>
> Can you share how your text_general looks like.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 16 Mar 2018, at 14:05, Paesen Roel <roel.paesen@africamuseum.be>
wrote:
>>
>> Hi,
>>
>> Unfortunately that also gives no results (and it would not be
>> practical, as for this example the numbering only goes up till 19 but
>> others go up into the thousands etc)
>>
>> Anybody with a pointer on this?
>>
>> Thanks already,
>> Roel
>>
>>
>> -----Original Message-----
>> From: jagdish vasani [mailto:jagdisht.vasani@gmail.com]
>> Sent: vrijdag 16 maart 2018 12:41
>> To: solr-user@lucene.apache.org
>> Subject: Re: question regarding wildcard-searches
>>
>> Hi paesen,
>>
>> Value - EO.1954.53.1 is indexed as below Eo
>> 1954
>> 53
>> 1
>> Dot is removed.try with wildcard -?
>> Like EO.1954.53.?? If you have 2 digits only in last..
>>
>> I have not tried but you just check it.
>> Hope it will solve your problem.
>>
>> Thanks,
>> Jagdish
>> On 16-Mar-2018 3:51 pm, "Paesen Roel" <roel.paesen@africamuseum.be>
wrote:
>>
>>> Hi everybody,
>>>
>>> We are experimenting with solr, and I have a (I think) basic-level
>>> question:
>>> we have a multiple fields, all copied into a generic field so we can
>>> search everything at once.
>>> However we have a (for us) strange situation doing wildcard searches
>>> for the contents of one specific field.
>>>
>>> Given in the schema:
>>>
>>> <field name="_text_" type="text_general" indexed="true" stored="false"
>>> multiValued="true"/>
>>>
>>> <field name="genormaliseerdInventarisnummer" type="string"
indexed="true"
>>> stored="true"/>
>>> <copyField source="genormaliseerdInventarisnummer" dest="_text_" />
>>> and lot of other fields exactly like 'genormaliseerdInventarisnummer'.
>>>
>>>
>>> Now, we are certain that the field 'genormaliseerdInventarisnummer'
>>> contains entries like 'EO.1954.53.1', 'EO.1954.53.2', EO.1954.53.3',
>>> all the way up to '.19', we can query these directly by passing
>>> these exact texts to the query on field '_text_' (our default search
field).
>>> Problem is: wildcard searches for these don't work, like 'EO.1954.53.*'
>>> for example returns zero results.
>>>
>>> Why is that?
>>> What needs to be adjusted? (and how?)
>>>
>>> Thanks already,
>>> Roel
>>>
>>>
>

No comments:

Post a Comment