Sunday, April 1, 2018

FW: Query redg : diacritics in keyword search

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: 30 March 2018 23:23
To: solr-user@lucene.apache.org
Subject: RE: Query redg : diacritics in keyword search

For a simple illustration of Charlie's point and a side bonus on the 78
reasons to use the ICUFoldingFilter if you happen to be processing Arabic
script languages, see slides 31-33:

https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvan
cedSearch_tallison_MITRE_201510_final_abbrev.pdf


-----Original Message-----
From: Charlie Hull [mailto:charlie@flax.co.uk]
Sent: Thursday, March 29, 2018 9:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Query redg : diacritics in keyword search

On 29/03/2018 14:12, Peter Lancaster wrote:
> Hi,
>
> You don't say whether the AsciiFolding filter is at index time or query
time. In any case you can easily look at what's happening using the admin
analysis tool which helpfully will even highlight where the analysed query
and index token match.
>
> That said I'd expect what you want to work if you simply use <filter
class="solr.ASCIIFoldingFilterFactory"/> on both index and query.

Simply put:

You use the filter at indexing time to collapse any variants of a term into
a single variant, which is then stored in your index.

You use the filter at query time to collapse any variants of a term that
users type into a single variant, and if this exists in your index you get a
match.

If you don't use the same filter at both ends you won't get a match.

Cheers

Charlie

>
> Cheers,
> Peter.
>
> -----Original Message-----
> From: Paul, Lulu [mailto:Lulu.Paul@bl.uk]
> Sent: 29 March 2018 12:03
> To: solr-user@lucene.apache.org
> Subject: Query redg : diacritics in keyword search
>
> Hi,
>
> The keyword search Carré returns values Carré and Carre (this works
> well as I added the tokenizer <filter
> class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/> in
> the schema config to enable returning of both sets of values)
>
> Now looks like we want Carre to return both Carré and Carre (and this
dosen't work. Solr only returns Carre) – any ideas on how this scenario can
be achieved?
>
> Thanks & Best Regards,
> Lulu Paul
>
>
>
> **********************************************************************
> ********************************************
> Experience the British Library online at www.bl.uk<http://www.bl.uk/>
> The British Library's latest Annual Report and Accounts :
> www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/in
> dex.html> Help the British Library conserve the world's knowledge.
> Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
> The Library's St Pancras site is WiFi - enabled
> **********************************************************************
> *******************************************
> The information contained in this e-mail is confidential and may be
legally privileged. It is intended for the addressee(s) only. If you are not
the intended recipient, please delete this e-mail and notify the
postmaster@bl.uk<mailto:postmaster@bl.uk> : The contents of this e-mail must
not be disclosed or copied without the sender's consent.
> The statements and opinions expressed in this message are those of the
author and do not necessarily reflect those of the British Library. The
British Library does not take any responsibility for the views of the
author.
> **********************************************************************
> *******************************************
> Think before you print
> ________________________________
>
> This message is confidential and may contain privileged information. You
should not disclose its contents to any other person. If you are not the
intended recipient, please notify the sender named above immediately. It is
expressly declared that this e-mail does not constitute nor form part of a
contract or unilateral obligation. Opinions, conclusions and other
information in this message that do not relate to the official business of
findmypast shall be understood as neither given nor endorsed by it.
> ________________________________
>
> ______________________________________________________________________
> ____
>
> This email has been checked for virus and other malicious content prior to
leaving our network.
> ______________________________________________________________________
> ____
>


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk

No comments:

Post a Comment