Sunday, May 27, 2018

some IPR question on software licenses

A research project in the area of intellectual property requires software that certainly has a number of functionalities that are specific to other niche projects that the project owners envisage, such as, an atypical examination of how courts deal with copyright cases including staff training, lobbying and influence activities that they wish to study. The project team would require these functionalities for limited periods and not necessarily for long-term use.  They find that funding for such time-bound and specific projects, which are very useful, to be challenging.  Since the project is time-bound and would need further study there are limited time and financial resources.

 

The key legal characteristics of open source software may make it a model most suitable in this context. Which of the characteristics below are NOT suitable for such a project:

Select one:
a. The opportunity to freely modify and improve the software;
b. The use of the software for any purpose, subject to the authorization of its creator;
c. The lack of royalties;
d. The opportunity to redistribute the software and its modified version

OSS is mainly a technical development model supported by standard licenses, with projects hosted online.  Anyone can usually join and participate, either contributing code, documentating, offering graphics or financial support.  However, it is a "development" model, not a "commercialization" model. As a result FOSS has often been seen as having purely technical advantages but also drawbacks associated to immaturity, security breaches, and technical and legal complexity.

 

Which among the following technical characteristics may NOT be associated with free and open source software:

Select one:
a. Reliability, auditability, interoperability
b. Openness, accessibility, customizable
c. Enterprise-grade support, local sales channel, warranties
d. Open standards compatible, technology independence, security

Kerala International Centre for Free and Open Source Software

"Background:  Following the State Government approval by law on the setting up of The International Centre for Free and Open Source Software (ICFOSS), the institution will be set up at Thiruvananthapuram. In a Press Meet in December 2009, the Hon'ble Chief Minister Shri.V.S.Achuthanandan, who also holds the charge for IT Department said that the Centre has been planned as part of the Government's programme to promote free software in the State.

In this present era that has witnessed explosion of knowledge thanks to the Internet, it is important to democratise access to knowledge. The Nobel Prize winning economist, Joseph Stiglitz theorizes that disparity over access to information and knowledge is humanity's single most potent cause of poverty and discord. This challenge to democratise knowledge has in recent years, given birth to a radical paradigm called Free and Open Source Software (FOSS), as a powerful alternative to monopolistic approaches to knowledge creations. The Kerala Government has time and again affirmed its intention to foster the State as a global destination for FOSS based software and IT enabled services.

ICFOSS is expected to go a long way in making Kerala a global FOSS destination. Some of the areas that this institution proposes to take up includes developing and customising Open Source applications, FOSS localization to Indian languages and speech interfaces on FOSS for the illiterate.

Vision and Mission: The vision of ICFOSS is to become a leading research organisation in Free and Open Source model of knowledge development thereby contributing towards sustainable development of society and to stimulate economic development in the region. The mission of ICFOSS is to promote research and development in the area of Free and Open Source Software and the knowledge development model it puts forward.

 

The main objective of the ICFOSS is:

Select one:
a. developing and customising Open Source applications
b. to become a leading research organisation in Free and Open Source model of knowledge development
c. contributing to sustainable development of society and to stimulate economic development in the region
d. all of the above

--
Roshan Agarwal
Chief Executive officer
Siddhast Ip innovation (P) ltd
907 chandra vihar colony
Jhansi-284002
M:+917376314900

Sunday, April 15, 2018

Starting services in safe mode

Recently I was trying to figure out how to start additional services in Windows safe mode. I had a user whose laptop kept crashing at login, I had a quick look and several theories came to mind but uptime was important, so as a temporary workaround I set it up in safe mode with networking.

A few days later the user calls and wants to be able to print in safe mode. I look into it, do some searching, but the prevailing wisdom seemed to be that it wasn't doable. This sounded like an MCP party line to me so I decide to explore the registry. Eventually I find theHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control key which has sub-keys of Minimal and Network. Minimal being safe mode, Network being safe mode with networking. It seems to be a whitelist of services, drivers and drive groups that are allowed to start or load.

Therefore it is possible to start additional services and load additional drivers in safe mode – just add a key for the service or driver short name, then a string for type. The below entry (if in a .reg file) would allow the Print Spooler to start in safe mode with networking.

1
2
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SafeBoot\Network\Spooler]
@="Service"

If you want a list of all drivers, driver groups and services starting in normal mode and their corresponding short names checkHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services

I'd caution against whitelisting too much as it kind of defeats the purpose of safe mode, though in certain situations as a quick hack it can useful. It may also be something worth checking the next time you're dealing with a particularly nasty malware infection. I haven't seen anything which exploits it yet, but I imagine something does.



http://www.krisdavidson.org/2010/09/11/starting-services-in-safe-mode/

Thursday, April 12, 2018

FW: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

-----Original Message-----
From: msaunier [mailto:msaunier@citya.com]
Sent: 09 April 2018 13:49
To: solr-user@lucene.apache.org
Subject: RE: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

I up my subject. Thanks





-----Message d'origine-----
De : msaunier [mailto:msaunier@citya.com] Envoyé : jeudi 5 avril 2018 10:46
À : solr-user@lucene.apache.org Objet : RE: ZKPropertiesWriter error DIH
(SolrCloud 6.6.1)

I have use this process to create the DIH :

1. Create the BLOB collection:
* curl
http://localhost:8983/solr/admin/collections?action=CREATE&name=.system

2. Send definition and file for DIH
* curl -X POST -H 'Content-Type: application/octet-stream' --data-binary
@ solr-dataimporthandler-6.6.1.jar
http://localhost:8983/solr/.system/blob/DataImportHandler
* curl -X POST -H 'Content-Type: application/octet-stream' --data-binary
@ mysql-connector-java-5.1.46.jar
http://localhost:8983/solr/.system/blob/MySQLConnector
* curl http://localhost:8983/solr/advertisements2/config -H
'Content-type:application/json' -d '{"add-runtimelib": {
"name":"DataImportHandler", "version":1 }}'
* curl http://localhost:8983/solr/advertisements2/config -H
'Content-type:application/json' -d '{"add-runtimelib": {
"name":"MySQLConnector", "version":1 }}'

3. I have add on the config file the requestHandler with the API. Result :
###
"/full-advertisements": {
"runtimeLib": true,
"version": 1,
"class": "org.apache.solr.handler.dataimport.DataImportHandler",
"defaults": {
"config": "DIH/advertisements.xml"
},
"name": "/full-advertisements"
},
###

4. I have add with the zkcli.sh script the .xml definition file in
/configs/advertisements2/DIH/advertisements.xml
###
<dataConfig>

<dataSource name="Gesloc" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://srv-gesloc-sql/TRANSACTIONCITYANEWLOCATION" user="ics"
password="******" />

<document>

<entity name="Advertisements_Gesloc" dataSource="Gesloc" pk="id"
transformer="TemplateTransformer" query="SELECT id,origin FROM
view_indexation_advertisements" >

<field column="id" name="id"/>
<field column="origin" name="origin"/>

</entity>

</document>

</dataConfig>
###

Thanks for your help.


-----Message d'origine-----
De : msaunier [mailto:msaunier@citya.com] Envoyé : mercredi 4 avril 2018
09:57 À : solr-user@lucene.apache.org Cc : fharrang@citya.com Objet :
ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

Hello,
I use Solr Cloud and I test DIH system in cloud, but I have this error :

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to PropertyWriter implementation:ZKPropertiesWriter at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:330)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport
er.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:326)
... 4 more

My DIH definition on the cloud

<dataConfig>

<dataSource name="Gesloc" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://srv-gesloc-sql/TRANSACTIONCITYANEWLOCATION" user="ics"
password="IcsPerms"
runtimeLib="true" version="1"/>

<document>

<entity name="Advertisements_Gesloc" dataSource="Gesloc" pk="id"
transformer="TemplateTransformer"
query="SELECT id,origin FROM view_indexation_advertisements" >

<field column="id" name="id"/>
<field column="origin" name="origin"/>

</entity>

</document>

</dataConfig>

Call response :

<http://localhost:8983/solr/advertisements2/full-advertisements?command=full
-import&clean=false&commit=true
>
http://localhost:8983/solr/advertisements2/full-advertisements?command=full-
import&clean=false&commit=true


<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<lst name="initArgs">
<bool name="runtimeLib">true</bool>
<long name="version">1</long>
<lst name="defaults">
<str name="config">DIH/advertisements.xml</str>
</lst>
</lst>
<str name="command">full-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages"/>
</response>

I don't understand why I have this error. Can you help me ?
Thanks you.

Monday, April 9, 2018

FW: Default Index config

-----Original Message-----
From: mganeshs [mailto:mganeshs@live.in]
Sent: 09 April 2018 15:34
To: solr-user@lucene.apache.org
Subject: Re: Default Index config

Hi Shawn,

Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ). GC is
not causing any issue as we use the default GC and also tried with G1 as you
suggested over here
<https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr>

Though it's only background process, we are suspecting whether it's causing
CPU to go high.

Since we are using SOLR as real time indexing of data and depending on its
result immd. to show it in UI as well. So we keep adding document around 100
to 200 documents in parallel in a sec. Also it would be in batch of 20 solr
documents list in one add...

*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection

/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
CollectionBucket collectionBucket =
getCollectionBucket(solrCollection);
List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
String collectionName = collectionBucket.getCollectionName();
try {
if(solrInputDocuments.size() > 0) {
CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
solrClient.add(collectionName, solrInputDocuments);
}
}/

*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClien
tUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/

Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ?

Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">30</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">30</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">512</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>

But couldn't see any much change in the behaviour.

In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc )


Can you throw some light on this aspects ?
Regards,

Regarding auto commit, we discussed lot with our product owners and atlast
> we are forced to keep it to 1sec and we couldn't increase further. As
> this itself, sometimes our customers says that they have to refresh
> their pages for couple of times to get the update from solr. So we
> can't increase further.

I understand pressure from nontechnical departments for very low response
times. Executives, sales, and marketing are usually the ones making those
kinds of demands. I think you should push back on that particular
requirement on technical grounds.

A soft commit interval that low *can* contribute to performance issues. It
doesn't always cause them, I'm just saying that it *can*.  Maybe increasing
it to five or ten seconds could help performance, or maybe it will make no
real difference at all.

> Yes. As of now only solr is running in that machine. But intially we
> were running along with hbase region servers and was working fine. But
> due to CPU spikes and OS disk cache, we are forced to move solr to
> separate machine.
> But just I checked, our solr data folder size is coming only to 17GB.
> 2 collection has around 5GB and other are have 2 to 3 GB of size. If
> you say that only 2/3 of total size comes to OS disk cache, in top
> command VIRT property it's always 28G, which means more than what we
> have. Why is that...
> Pls check that top command & GC we used in this doc
> &lt;https://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQf
> DqfmV802hWIiQ/edit?usp=sharing&gt;

The VIRT memory should be about equivalent to the RES size plus the size of
all the index data on the system.  So that looks about right.  The actual
amount of memory allocated by Java for the heap and other memory structures
is approximately equal to RES minus SHR.

I am not sure whether the SHR size gets counted in VIRT. It probably does. 
On some Linux systems, SHR grows to a very high number, but when that
happens, it typically doesn't reflect actual memory usage.  I do not know
why this sometimes happens.That is a question for Oracle, since they are the
current owners of Java.

Only 5GB is in the buff/cache area.  The system has 13GB of free memory. 
That system is NOT low on memory.

With 4 CPUs, a load average in the 3-4 range is an indication that the
server is busy.  I can't say for sure whether it means the server is
overloaded.  Sometimes the load average on a system that's working well can
go higher than the CPU count, sometimes a load average well below the CPU
count is shown on a system with major performance issues.  It's difficult to
say.  The instantaneous CPU usage on the Solr process in that screenshot is
384 percent.  Which means that it is exercising the CPUs hard. But this
might be perfectly OK.  96.3 percent of the CPU is being used by user
processes, a VERY small amount is being used by system, and the iowait
percentage is zero.  Typically servers that are struggling will have a
higher percentage in system and/or iowait, and I don't see that here.

> Queries are quiet fast, most of time simple queries with fq. Regarding
> index, during peak hours, we index around 100 documents in a second in
> a average.

That's good.  And not surprising, given how little memory pressure and how
much free memory there is.  An indexing rate of 100 per second doesn't seem
like a lot of indexing to me, but for some indexes, it might be very heavy. 
If your general performance is good, I wouldn't be too concerned about it.

> Regarding release, initially we tried with 6.4.1 and since many
> discussions over here, mentioned like moving to 6.5.x will solve lot
> of performance issues etc, so we moved to 6.5.1. We will move to 6.6.3
> in near future.

The 6.4.1 version had a really bad bug in it that killed performance for
most users.  Some might not have even noticed a problem, though.  It's
difficult to say for sure whether it would be something you would notice, or
whether you would see an increase in performance by upgrading.

> Hope I have given enough information. One strange thing is that, CPU
> and memory spike are not seen when we move to r4.xlarge to r4.2xlarge
> ( which is
> 8 core with 60 GB RAM ). But this would not be cost effective. What's
> making CPU and memory to go high in this new version ( due to doc
> values )? If I switch off docvalues will CPU & Memory spikes will get
> reduced ?

Overall memory usage (outside of the Java heap) looks great to me.  CPU
usage is high, but I can't tell if it's TOO high. As a proof of concept, I
think you should try raising autoSoftCommit to five seconds.  If maxDocs is
configured on either autoCommit or autoSoftCommit, remove it so that only
maxTime is there, regardless of whether you actually change maxTime.  If
raising autoSoftCommit makes no real difference, then the 1 second
autoSoftCommit probably isn't a worry.  I bet if you raised it to five
seconds, most users would never notice anything different.

If you want to provide a GC log to us that covers a relatively long
timeframe, we can analyze that and let you know whether your heap is sized
appropriately, or whether it might be too big or too small, and whether
garbage collection pauses are keeping your CPU usage high.  The standard
Solr startup in most current versions always logs GC activity. It will
usually be in the same directory as solr.log.

Do you know what typical and peak queries per second are on your Solr
servers?  If your query rate is high, handling that will probably require
more servers and a higher replica count.

Thanks,
Shawn





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

-----Original Message-----
From: Alessandro Benedetti [mailto:a.benedetti@sease.io]
Sent: 09 April 2018 15:43
To: solr-user@lucene.apache.org
Subject: Re: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

Hi Sami,
I agree with Mikhail, if you have relatively complex data you could curate
your own knowledge base for products as use it for Named entity Recognition.
You can then search a field compatible_with the extracted entity.

If the scenario is simpler using the analysis chain you mentioned should
work (if the product names are always complete and well curated).

Cheers





--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director www.sease.io

On Mon, Apr 9, 2018 at 10:40 AM, Adhyan Arizki <a.arizki@gmail.com> wrote:

> You can just use synonyms for that.. rather hackish but it works
>
> On Mon, 9 Apr 2018, 05:06 Sami al Subhi, <sami@alsubhi.me> wrote:
>
> > I think this filter will output the desired result:
> >
> > <analyzer type="query">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.ShingleFilterFactory"/>
> > </analyzer>
> > <analyzer type="index">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.FingerprintFilterFactory" separator=" " />
> > </analyzer>
> >
> > indexing:
> > "iPhone 6" will be indexed as "iphone 6" (always a single token)
> >
> > querying:
> > so this will analyze "Apple iPhone 6 32GB white" to "apple", "apple
> > iphone", "iphone", "iphone 6" and so on...
> > then here a match will be achieved using the 4th token.
> >
> >
> > I dont see how this will result in false positive matching.
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>

FW: Solr join With must clause in fq

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org]
Sent: 09 April 2018 15:49
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr join With must clause in fq

it might make sense to test on the recent versions of Solr.

On Sun, Apr 8, 2018 at 8:21 PM, manuj singh <s.manuj545@gmail.com> wrote:

> Hi all,
> I am trying to debug a problem which i am facing and need some help.
>
> I have a solr query which does join on 2 different cores. so lets say
> my first core has following 3 docs
>
> { "id":"1", "m_id":"lebron", "some_info":"29" }
>
> { "id":"2", "m_id":"Wade", "matches_win":"29" }
>
> { "id":"3", "m_id":"lebron", "some_info":"1234" }
>
> my second core has the following docs
>
> { "m_id": "lebron", "team": "miami" }
>
> { "m_id": "Wade", "team": "miami" }
>
> so now we made an update to doc with lebron and changed the team to
> "clevelend". So the new docs in core 2 looks like this.
>
> { "m_id": "lebron", "team": "clevelend" }
>
> { "m_id": "Wade", "team": "miami" }
>
> now i am trying to join these 2 and finding the docs form core1 for
> team miami.
>
> my query looks like this
>
> fq=+{!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> I am expecting it to return doc with id=2 but what i am getting is
> document
> 1 and 2.
>
> I am not able to figure out what is the problem. Is the query incorrect ?
> or is there some issue in join.
>
> *Couple of observations.*
>
> 1.if i remove the + from the filter query it works as expected. so the
> following query works
>
> fq={!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> I am not sure how the Must clause affecting the query.
>
> *2.* Also if you look the original query is not returning document
> 3.(however its returning document 1 which has the same m_id). Now the
> only difference between doc 1 and doc3 is that doc1 was created when
"lebron"
> was part of team: miami. and doc3 was created when the team got
> updated to "cleveland". So the join is working fine for the new docs
> in core1 but not for the old docs.
>
> 3.If i use q instead of fq the query returns results as expected.
>
> q=+{!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> and
>
> q={!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> Both of the above works.
>
> I am sure i am missing something how internally join works. I am
> trying to understand why fq has a different behavior then q with the
Must(+) clause.
>
> I am using solr 4.10.
>
>
>
> Thanks
>
> Manuj
>



--
Sincerely yours
Mikhail Khludnev