Sunday, April 1, 2018

FW: Using Solr to build a product matcher, with learning to rank

-----Original Message-----
From: Xavier Schepler [mailto:xavier.schepler@recommerce.com]
Sent: 28 March 2018 21:55
To: solr-user@lucene.apache.org
Subject: Using Solr to build a product matcher, with learning to rank

Hello,

I'm considering using Solr with learning to rank to build a product matcher.
For example, it should match the titles:
- Apple iPhone 6 16 Gb,
- iPhone 6 16 Gb,
- Smartphone IPhone 6 16 Gb,
- iPhone 6 black 16 Gb,
to the same internal reference, an unique identifier.

With Solr, each document would then have a field for the product title and
one for its class, which is the unique identifier of the product.
Solr would then be used to perform matching as follows.

1. A search is performed with a given product title.
2. The first three results are considered (this requires an initial
product title database).
3. The most frequent identifier is returned.

This method corresponds roughly to a k-Nearest Neighbor approach with the
cosine metric, k = 3, and a TF-IDF model.

I've done some preliminary tests with Sci-kit learn and the results are
good, but not as good as the ones of more sophisticated learning algorithms.

Then, I noticed that there exists learning to rank with Solr.

First, do you think that such an use of Solr makes sense?
Second, is there a relatively simple way to build a learning model using a
sparse representation of the query TF-IDF vector?

Kind regards,

Xavier Schepler

No comments:

Post a Comment