Sunday, March 18, 2018

FW: Looking for design ideas

-----Original Message-----
From: Steven White [mailto:swhite4141@gmail.com]
Sent: 18 March 2018 20:44
To: solr-user@lucene.apache.org
Subject: Looking for design ideas

Hi everyone,

I have a design problem that i"m not sure how to solve best so I figured I
share it here and see what ideas others may have.

I have a DB that hold documents (over 1 million and growing). This is known
as the "Public" DB that holds documents visible to all of my end users.

My application let users "check-out" one or more documents at a time off
this "Public" DB, edit them and "check-in" back into the "Public" DB. When
a document is checked-out, it goes into a "Personal" DB for that user (and
the document in the "Public" DB is flagged as such to alert other users.)
The owner of this checked-out document in the "Personal" DB can make changes
to the document and save it back into the "Personal" DB as often as he wants
to. Sometimes the document lives in the "Personal" DB for few minutes
before it is checked-in back into the "Public" DB and sometimes it can live
in the "Personal" DB for 1 day or 1 month. When a document is saved into
the "Personal" DB, only the owner of that document can see it.

Currently there are 100 users but this will grow to at least 500 or maybe
even 1000.

I'm looking at a solution on how to enable a full text search on those
documents, both in the "Public" and "Personal" DB so that:

1) Documents in the "Public" DB are searchable by all users. This is the
easy part.

2) Documents in the "Personal" DB of each user is searchable by the owner of
that "Personal" DB. This is easy too.

3) A user can search both the "Public" and "Personal" DB at anytime but if a
document is in the "Personal" DB, we will not search it the "Public" --
i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB.

Item #3 is important and is what I'm trying to solve. The goal is to give
hits to the user on documents that they are editing (in their "Personal"
DB) instead of that in the "Public".

The way I'm thinking to solve this problem is to create 2 Solr indexes (do
we call those "cores"?):

1) The "Public" DB is indexed into the "Public" Solr index.

2) The "Personal" DB is indexed into the "Personal" Solr index with a field
indicating the owner of that document.

With the above 2 indexes, I can now send the user's search syntax to both
indexes but for the "Public", I will also send a list of IDs (those
documents in the user's "Personal" DB) to exclude from the result set.
This way, I let a user search both the "Public" and "Personal" DB as such
the documents in the "Personal" DB are included in the search and are
excluded from the "Public" DB.

Did I make sense? If so, is this doable? Will ranking be effected given
that I'm searching 2 indexes?

Let me know what issues I might be overlooking with this solution.

Thanks

Steve

No comments:

Post a Comment