Siddhast lab: FW: How do I create a schema file for FIX data in Solr

Sunday, April 1, 2018

FW: How do I create a schema file for FIX data in Solr

-----Original Message-----
From: Raymond Xie [mailto:xie3208080@gmail.com]
Sent: 02 April 2018 10:05
To: solr-user@lucene.apache.org; Hui Xie <xie3208080@gmail.com>
Subject: Re: How do I create a schema file for FIX data in Solr

Thank you, Shawn, Rick and other readers,

To Shawn:

For *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8 means
BeginString, in this example, its value is FIX.4.4.9, and 9 means body
length, it is 653 for this message, 35 is RIO, meaning the message type is
RIO, 122 stands for OrigSendingTime and has a format of UTCTimestamp

You can refer to this page for details: https://www.onixs.biz
/fix-dictionary/4.2/fields_by_tag.html

All the values are explained as string type.

All the tag numbers are from FIX standard so it doesn't change (in my case)

I expect a python program might be needed to parse the message and extract
each tag's value, index is to be made on those extracted value as long as
their field (tag) name.

With index in place, ideally and naturally user will search for any keyword,
however, in this case, most queries would be based on tag 37 (Order ID) and
75 (Trade Date), there is another customized tag (not in the
standard) Order Version to be queried on.

I understand the parser creation would be a manual process, as long as I
know or have a small sample program, I will do it myself and maybe adjust it
as per need.

To Rick:

You mentioned creating JSON document, my understanding is a parser would be
needed to generate that JSON document, do you have any existing example
code?

Thank you guys very much.

*------------------------------------------------*
*Sincerely yours,*

*Raymond*

On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>
>> FIX is a format standard of financial data. It contains lots of tags
>> in number with value for the tag, like 8=asdf, where 8 is the tag and
>> asdf is the tag's value. Each tag has its definition.
>>
>> The sample msg in FIX format was in the original question.
>>
>> All I need to do is to know how to paste the msg and get all tag's value.
>>
>> I found so far a parser is what I need to start with., But I am more
>> concerning about how to create index in Solr on the extracted tag's
>> value, that is the first step, the next would be to customize the
>> dashboard for users to search with a value to find out which msg
>> contains that value in which tag and present users the whole msg as
proof.
>>
>
> Most of Solr's functionality is provided by Lucene. Lucene is a java
> API that implements search functionality. Solr bolts on some
> functionality on top of Lucene, but doesn't really do anything to
> fundamentally change the fact that you're dealing with a Lucene index.
> So I'm going to mostly talk about Lucene below.
>
> Lucene organizes data in a unit that we call a "document." An easy
> analogy for this is that it is a lot like a row in a single database
> table. It has fields, each field has a type. Unless custom software
> is used, there is really no support for data other than basic
> primitive types -- numbers and strings. The only complex type that I
> can think of that Solr supports out of the box is geospatial
> coordinates, and it might even support multi-dimensional coordinates,
> but I'm not sure. It's not all that complex
> -- the field just stores and manipulates multiple numbers instead of one.
> The Lucene API does support a FEW things that Solr doesn't implement.
> I don't think those are applicable to what you're trying to do.
>
> Let's look at the first part of the data that you included in the
> first
> message:
>
> 8=FIX.4.4 9=653 35=RIO
>
> Is "8" always a mixture of letters and numbers and periods? Is "9"
> always a number, and is it always a WHOLE number? Is "35" always letters?
> Looking deeper to data that I didn't quote ... is "122" always a
> date/time value? Are the tag numbers always picked from a
> well-defined set, or do they change?
>
> Assuming that the answers in the previous paragraph are found and a
> configuration is created to deal with all of it ... how are you
> planning to search it? What kind of queries would you expect somebody
> to make? That's going to have a huge influence on how you configure
things.
>
> Writing the schema is usually where people spend the most time when
> they're setting up Solr.
>
> Thanks,
> Shawn
>
>

Sunday, April 1, 2018

FW: How do I create a schema file for FIX data in Solr

No comments:

Post a Comment

Blog Archive

About Me