Siddhast lab

SSH – Putty Clear Cache – Compunet Blog

2018-06-05T18:46:00.001-07:00

https://blog.compunet.co.za/ssh-putty-clear-cache/

some IPR question on software licenses

2018-05-27T21:40:00.000-07:00

A research project in the area of intellectual property requires software that certainly has a number of functionalities that are specific to other niche projects that the project owners envisage, such as, an atypical examination of how courts deal with copyright cases including staff training, lobbying and influence activities that they wish to study. The project team would require these functionalities for limited periods and not necessarily for long-term use. They find that funding for such time-bound and specific projects, which are very useful, to be challenging. Since the project is time-bound and would need further study there are limited time and financial resources.

The key legal characteristics of open source software may make it a model most suitable in this context. Which of the characteristics below are NOT suitable for such a project:

Select one:

a. The opportunity to freely modify and improve the software;

b. The use of the software for any purpose, subject to the authorization of its creator;

c. The lack of royalties;

d. The opportunity to redistribute the software and its modified version

OSS is mainly a technical development model supported by standard licenses, with projects hosted online. Anyone can usually join and participate, either contributing code, documentating, offering graphics or financial support. However, it is a "development" model, not a "commercialization" model. As a result FOSS has often been seen as having purely technical advantages but also drawbacks associated to immaturity, security breaches, and technical and legal complexity.

Which among the following technical characteristics may NOT be associated with free and open source software:

Select one:

a. Reliability, auditability, interoperability

b. Openness, accessibility, customizable

c. Enterprise-grade support, local sales channel, warranties

d. Open standards compatible, technology independence, security

Kerala International Centre for Free and Open Source Software

"Background: Following the State Government approval by law on the setting up of The International Centre for Free and Open Source Software (ICFOSS), the institution will be set up at Thiruvananthapuram. In a Press Meet in December 2009, the Hon'ble Chief Minister Shri.V.S.Achuthanandan, who also holds the charge for IT Department said that the Centre has been planned as part of the Government's programme to promote free software in the State.

In this present era that has witnessed explosion of knowledge thanks to the Internet, it is important to democratise access to knowledge. The Nobel Prize winning economist, Joseph Stiglitz theorizes that disparity over access to information and knowledge is humanity's single most potent cause of poverty and discord. This challenge to democratise knowledge has in recent years, given birth to a radical paradigm called Free and Open Source Software (FOSS), as a powerful alternative to monopolistic approaches to knowledge creations. The Kerala Government has time and again affirmed its intention to foster the State as a global destination for FOSS based software and IT enabled services.

ICFOSS is expected to go a long way in making Kerala a global FOSS destination. Some of the areas that this institution proposes to take up includes developing and customising Open Source applications, FOSS localization to Indian languages and speech interfaces on FOSS for the illiterate.

Vision and Mission: The vision of ICFOSS is to become a leading research organisation in Free and Open Source model of knowledge development thereby contributing towards sustainable development of society and to stimulate economic development in the region. The mission of ICFOSS is to promote research and development in the area of Free and Open Source Software and the knowledge development model it puts forward.

The main objective of the ICFOSS is:

Select one:

a. developing and customising Open Source applications

b. to become a leading research organisation in Free and Open Source model of knowledge development

c. contributing to sustainable development of society and to stimulate economic development in the region

d. all of the above

Roshan Agarwal

Chief Executive officer

Siddhast Ip innovation (P) ltd

907 chandra vihar colony

Jhansi-284002
M:+917376314900

Starting services in safe mode

2018-04-15T15:32:00.001-07:00

Recently I was trying to figure out how to start additional services in Windows safe mode. I had a user whose laptop kept crashing at login, I had a quick look and several theories came to mind but uptime was important, so as a temporary workaround I set it up in safe mode with networking.

A few days later the user calls and wants to be able to print in safe mode. I look into it, do some searching, but the prevailing wisdom seemed to be that it wasn't doable. This sounded like an MCP party line to me so I decide to explore the registry. Eventually I find theHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control key which has sub-keys of Minimal and Network. Minimal being safe mode, Network being safe mode with networking. It seems to be a whitelist of services, drivers and drive groups that are allowed to start or load.

Therefore it is possible to start additional services and load additional drivers in safe mode – just add a key for the service or driver short name, then a string for type. The below entry (if in a .reg file) would allow the Print Spooler to start in safe mode with networking.

1
2
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SafeBoot\Network\Spooler]
@="Service"

If you want a list of all drivers, driver groups and services starting in normal mode and their corresponding short names checkHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services

I'd caution against whitelisting too much as it kind of defeats the purpose of safe mode, though in certain situations as a quick hack it can useful. It may also be something worth checking the next time you're dealing with a particularly nasty malware infection. I haven't seen anything which exploits it yet, but I imagine something does.

http://www.krisdavidson.org/2010/09/11/starting-services-in-safe-mode/

FW: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

2018-04-12T23:41:00.001-07:00

-----Original Message-----
From: msaunier [mailto:msaunier@citya.com]
Sent: 09 April 2018 13:49
To: solr-user@lucene.apache.org
Subject: RE: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

I up my subject. Thanks

-----Message d'origine-----
De : msaunier [mailto:msaunier@citya.com] Envoyé : jeudi 5 avril 2018 10:46
À : solr-user@lucene.apache.org Objet : RE: ZKPropertiesWriter error DIH
(SolrCloud 6.6.1)

I have use this process to create the DIH :

1. Create the BLOB collection:
* curl
http://localhost:8983/solr/admin/collections?action=CREATE&name=.system

2. Send definition and file for DIH
* curl -X POST -H 'Content-Type: application/octet-stream' --data-binary
@ solr-dataimporthandler-6.6.1.jar
http://localhost:8983/solr/.system/blob/DataImportHandler
* curl -X POST -H 'Content-Type: application/octet-stream' --data-binary
@ mysql-connector-java-5.1.46.jar
http://localhost:8983/solr/.system/blob/MySQLConnector
* curl http://localhost:8983/solr/advertisements2/config -H
'Content-type:application/json' -d '{"add-runtimelib": {
"name":"DataImportHandler", "version":1 }}'
* curl http://localhost:8983/solr/advertisements2/config -H
'Content-type:application/json' -d '{"add-runtimelib": {
"name":"MySQLConnector", "version":1 }}'

3. I have add on the config file the requestHandler with the API. Result :
###
"/full-advertisements": {
"runtimeLib": true,
"version": 1,
"class": "org.apache.solr.handler.dataimport.DataImportHandler",
"defaults": {
"config": "DIH/advertisements.xml"
},
"name": "/full-advertisements"
},
###

4. I have add with the zkcli.sh script the .xml definition file in
/configs/advertisements2/DIH/advertisements.xml
###
<dataConfig>

<dataSource name="Gesloc" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://srv-gesloc-sql/TRANSACTIONCITYANEWLOCATION" user="ics"
password="******" />

<document>

<entity name="Advertisements_Gesloc" dataSource="Gesloc" pk="id"
transformer="TemplateTransformer" query="SELECT id,origin FROM
view_indexation_advertisements" >

<field column="id" name="id"/>
<field column="origin" name="origin"/>

</entity>

</document>

</dataConfig>
###

Thanks for your help.

-----Message d'origine-----
De : msaunier [mailto:msaunier@citya.com] Envoyé : mercredi 4 avril 2018
09:57 À : solr-user@lucene.apache.org Cc : fharrang@citya.com Objet :
ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

Hello,
I use Solr Cloud and I test DIH system in cloud, but I have this error :

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to PropertyWriter implementation:ZKPropertiesWriter at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:330)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport
er.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:326)
... 4 more

My DIH definition on the cloud

<dataConfig>

<dataSource name="Gesloc" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://srv-gesloc-sql/TRANSACTIONCITYANEWLOCATION" user="ics"
password="IcsPerms"
runtimeLib="true" version="1"/>

<document>

<entity name="Advertisements_Gesloc" dataSource="Gesloc" pk="id"
transformer="TemplateTransformer"
query="SELECT id,origin FROM view_indexation_advertisements" >

<field column="id" name="id"/>
<field column="origin" name="origin"/>

</entity>

</document>

</dataConfig>

Call response :

<http://localhost:8983/solr/advertisements2/full-advertisements?command=full
-import&clean=false&commit=true>
http://localhost:8983/solr/advertisements2/full-advertisements?command=full-
import&clean=false&commit=true

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<lst name="initArgs">
<bool name="runtimeLib">true</bool>
<long name="version">1</long>
<lst name="defaults">
<str name="config">DIH/advertisements.xml</str>
</lst>
</lst>
<str name="command">full-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages"/>
</response>

I don't understand why I have this error. Can you help me ?
Thanks you.

FW: Default Index config

2018-04-09T03:46:00.005-07:00

-----Original Message-----
From: mganeshs [mailto:mganeshs@live.in]
Sent: 09 April 2018 15:34
To: solr-user@lucene.apache.org
Subject: Re: Default Index config

Hi Shawn,

Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ). GC is
not causing any issue as we use the default GC and also tried with G1 as you
suggested over here
<https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr>

Though it's only background process, we are suspecting whether it's causing
CPU to go high.

Since we are using SOLR as real time indexing of data and depending on its
result immd. to show it in UI as well. So we keep adding document around 100
to 200 documents in parallel in a sec. Also it would be in batch of 20 solr
documents list in one add...

*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection

/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
CollectionBucket collectionBucket =
getCollectionBucket(solrCollection);
List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
String collectionName = collectionBucket.getCollectionName();
try {
if(solrInputDocuments.size() > 0) {
CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
solrClient.add(collectionName, solrInputDocuments);
}
}/

*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClien
tUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/

Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ?

Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">30</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">30</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">512</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>

But couldn't see any much change in the behaviour.

In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc )

Can you throw some light on this aspects ?
Regards,

Regarding auto commit, we discussed lot with our product owners and atlast
> we are forced to keep it to 1sec and we couldn't increase further. As
> this itself, sometimes our customers says that they have to refresh
> their pages for couple of times to get the update from solr. So we
> can't increase further.

I understand pressure from nontechnical departments for very low response
times. Executives, sales, and marketing are usually the ones making those
kinds of demands. I think you should push back on that particular
requirement on technical grounds.

A soft commit interval that low *can* contribute to performance issues. It
doesn't always cause them, I'm just saying that it *can*. Maybe increasing
it to five or ten seconds could help performance, or maybe it will make no
real difference at all.

> Yes. As of now only solr is running in that machine. But intially we
> were running along with hbase region servers and was working fine. But
> due to CPU spikes and OS disk cache, we are forced to move solr to
> separate machine.
> But just I checked, our solr data folder size is coming only to 17GB.
> 2 collection has around 5GB and other are have 2 to 3 GB of size. If
> you say that only 2/3 of total size comes to OS disk cache, in top
> command VIRT property it's always 28G, which means more than what we
> have. Why is that...
> Pls check that top command & GC we used in this doc
> <https://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQf
> DqfmV802hWIiQ/edit?usp=sharing>

The VIRT memory should be about equivalent to the RES size plus the size of
all the index data on the system. So that looks about right. The actual
amount of memory allocated by Java for the heap and other memory structures
is approximately equal to RES minus SHR.

I am not sure whether the SHR size gets counted in VIRT. It probably does.
On some Linux systems, SHR grows to a very high number, but when that
happens, it typically doesn't reflect actual memory usage. I do not know
why this sometimes happens.That is a question for Oracle, since they are the
current owners of Java.

Only 5GB is in the buff/cache area. The system has 13GB of free memory.
That system is NOT low on memory.

With 4 CPUs, a load average in the 3-4 range is an indication that the
server is busy. I can't say for sure whether it means the server is
overloaded. Sometimes the load average on a system that's working well can
go higher than the CPU count, sometimes a load average well below the CPU
count is shown on a system with major performance issues. It's difficult to
say. The instantaneous CPU usage on the Solr process in that screenshot is
384 percent. Which means that it is exercising the CPUs hard. But this
might be perfectly OK. 96.3 percent of the CPU is being used by user
processes, a VERY small amount is being used by system, and the iowait
percentage is zero. Typically servers that are struggling will have a
higher percentage in system and/or iowait, and I don't see that here.

> Queries are quiet fast, most of time simple queries with fq. Regarding
> index, during peak hours, we index around 100 documents in a second in
> a average.

That's good. And not surprising, given how little memory pressure and how
much free memory there is. An indexing rate of 100 per second doesn't seem
like a lot of indexing to me, but for some indexes, it might be very heavy.
If your general performance is good, I wouldn't be too concerned about it.

> Regarding release, initially we tried with 6.4.1 and since many
> discussions over here, mentioned like moving to 6.5.x will solve lot
> of performance issues etc, so we moved to 6.5.1. We will move to 6.6.3
> in near future.

The 6.4.1 version had a really bad bug in it that killed performance for
most users. Some might not have even noticed a problem, though. It's
difficult to say for sure whether it would be something you would notice, or
whether you would see an increase in performance by upgrading.

> Hope I have given enough information. One strange thing is that, CPU
> and memory spike are not seen when we move to r4.xlarge to r4.2xlarge
> ( which is
> 8 core with 60 GB RAM ). But this would not be cost effective. What's
> making CPU and memory to go high in this new version ( due to doc
> values )? If I switch off docvalues will CPU & Memory spikes will get
> reduced ?

Overall memory usage (outside of the Java heap) looks great to me. CPU
usage is high, but I can't tell if it's TOO high. As a proof of concept, I
think you should try raising autoSoftCommit to five seconds. If maxDocs is
configured on either autoCommit or autoSoftCommit, remove it so that only
maxTime is there, regardless of whether you actually change maxTime. If
raising autoSoftCommit makes no real difference, then the 1 second
autoSoftCommit probably isn't a worry. I bet if you raised it to five
seconds, most users would never notice anything different.

If you want to provide a GC log to us that covers a relatively long
timeframe, we can analyze that and let you know whether your heap is sized
appropriately, or whether it might be too big or too small, and whether
garbage collection pauses are keeping your CPU usage high. The standard
Solr startup in most current versions always logs GC activity. It will
usually be in the same directory as solr.log.

Do you know what typical and peak queries per second are on your Solr
servers? If your query rate is high, handling that will probably require
more servers and a higher replica count.

Thanks,
Shawn

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

2018-04-09T03:46:00.003-07:00

-----Original Message-----
From: Alessandro Benedetti [mailto:a.benedetti@sease.io]
Sent: 09 April 2018 15:43
To: solr-user@lucene.apache.org
Subject: Re: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

Hi Sami,
I agree with Mikhail, if you have relatively complex data you could curate
your own knowledge base for products as use it for Named entity Recognition.
You can then search a field compatible_with the extracted entity.

If the scenario is simpler using the analysis chain you mentioned should
work (if the product names are always complete and well curated).

Cheers

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director www.sease.io

On Mon, Apr 9, 2018 at 10:40 AM, Adhyan Arizki <a.arizki@gmail.com> wrote:

> You can just use synonyms for that.. rather hackish but it works
>
> On Mon, 9 Apr 2018, 05:06 Sami al Subhi, <sami@alsubhi.me> wrote:
>
> > I think this filter will output the desired result:
> >
> > <analyzer type="query">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.ShingleFilterFactory"/>
> > </analyzer>
> > <analyzer type="index">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.FingerprintFilterFactory" separator=" " />
> > </analyzer>
> >
> > indexing:
> > "iPhone 6" will be indexed as "iphone 6" (always a single token)
> >
> > querying:
> > so this will analyze "Apple iPhone 6 32GB white" to "apple", "apple
> > iphone", "iphone", "iphone 6" and so on...
> > then here a match will be achieved using the 4th token.
> >
> >
> > I dont see how this will result in false positive matching.
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>

FW: Solr join With must clause in fq

2018-04-09T03:46:00.001-07:00

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org]
Sent: 09 April 2018 15:49
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr join With must clause in fq

it might make sense to test on the recent versions of Solr.

On Sun, Apr 8, 2018 at 8:21 PM, manuj singh <s.manuj545@gmail.com> wrote:

> Hi all,
> I am trying to debug a problem which i am facing and need some help.
>
> I have a solr query which does join on 2 different cores. so lets say
> my first core has following 3 docs
>
> { "id":"1", "m_id":"lebron", "some_info":"29" }
>
> { "id":"2", "m_id":"Wade", "matches_win":"29" }
>
> { "id":"3", "m_id":"lebron", "some_info":"1234" }
>
> my second core has the following docs
>
> { "m_id": "lebron", "team": "miami" }
>
> { "m_id": "Wade", "team": "miami" }
>
> so now we made an update to doc with lebron and changed the team to
> "clevelend". So the new docs in core 2 looks like this.
>
> { "m_id": "lebron", "team": "clevelend" }
>
> { "m_id": "Wade", "team": "miami" }
>
> now i am trying to join these 2 and finding the docs form core1 for
> team miami.
>
> my query looks like this
>
> fq=+{!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> I am expecting it to return doc with id=2 but what i am getting is
> document
> 1 and 2.
>
> I am not able to figure out what is the problem. Is the query incorrect ?
> or is there some issue in join.
>
> *Couple of observations.*
>
> 1.if i remove the + from the filter query it works as expected. so the
> following query works
>
> fq={!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> I am not sure how the Must clause affecting the query.
>
> *2.* Also if you look the original query is not returning document
> 3.(however its returning document 1 which has the same m_id). Now the
> only difference between doc 1 and doc3 is that doc1 was created when
"lebron"
> was part of team: miami. and doc3 was created when the team got
> updated to "cleveland". So the join is working fine for the new docs
> in core1 but not for the old docs.
>
> 3.If i use q instead of fq the query returns results as expected.
>
> q=+{!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> and
>
> q={!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> Both of the above works.
>
> I am sure i am missing something how internally join works. I am
> trying to understand why fq has a different behavior then q with the
Must(+) clause.
>
> I am using solr 4.10.
>
>
>
> Thanks
>
> Manuj
>

--
Sincerely yours
Mikhail Khludnev

FW: custom filter class on schema.xml on solrcloud

2018-04-02T02:38:00.001-07:00

-----Original Message-----
From: void [mailto:sauravsust71@gmail.com]
Sent: 02 April 2018 14:32
To: solr-user@lucene.apache.org
Subject: custom filter class on schema.xml on solrcloud

I have used a custom filter provided by a jar in schema.xml in standalone
Solr like below

<filter class="com.x.yFilterFactory"
stopWordDictionary="resources/yStopWords"/>

And for this,

I have loaded the jar in solrconfig.xml like below

<lib dir="./../plugins/" regex=".*\.jar" />

It's working fine But when I've tried to use it in solrcloud with external
zookeeper mode I've got an error 'IO exception' maybe for uploading a large
jar file in zookeeper.

I've also tried to put this jar in the lib folder of solr home but got error
'Plugin init failure'

After that, I've tried blob store api but the documentation says "Blob store
can only be used to dynamically load components configured in
solrconfig.xml. Components specified in schema.xml cannot be loaded from
blob store"

So, how can I use custom filter class in schema.xml in solrcloud mode with
external zookeeper configuration

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: Learning to Rank (LTR) with grouping

2018-04-02T00:07:00.001-07:00

-----Original Message-----
From: ilayaraja [mailto:ilay.msp@gmail.com]
Sent: 02 April 2018 12:27
To: solr-user@lucene.apache.org
Subject: Re: Learning to Rank (LTR) with grouping

Hi Roopa & Deigo,

I am facing same issue with grouping. Currently, am on Solr 7.2.1 but still
see that grouping with LTR is not working. Did you apply it as patch or the
latest solr version has the fix already?

Ilay

-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: Upgrading a Plugin from 6.6 to 7.x

2018-04-01T22:50:00.020-07:00

-----Original Message-----
From: Peter Alexander Kopciak [mailto:peter@kopciak.at]
Sent: 21 March 2018 16:17
To: solr-user@lucene.apache.org
Subject: Upgrading a Plugin from 6.6 to 7.x

Hi!

I'm still pretty new to Solr and I want to use the vector Scoring plugin (
https://github.com/saaay71/solr-vector-scoring/network) but unfortunately,
it does not seem to work for newer Solr versions.

I tested it with 6.6 to verify its functionality, so it seems to be broken
because of the upgrade to 7.x.

When following the installation procedure and executing the examples, I ran
into the following error with Query 1:

java.lang.UnsupportedOperationException: Query {! type=vp f=vector
vector=0.1,4.75,0.3,1.2,0.7,4.0 v=} does not implement createWeight

Does anyone has a lead for me how to fix/upgrade the plugin? The
createWeight method seems to exist, so I'm not sure where to start and waht
the problem seems to be.

FW: Get terms in solr not working

2018-04-01T22:50:00.019-07:00

-----Original Message-----
From: adam rag [mailto:adamrag16@gmail.com]
Sent: 21 March 2018 11:10
To: solr-user@lucene.apache.org
Subject: Get terms in solr not working

To get top words in my Apache Solr instance, I am using "terms" query. When
I try it to get 10 terms in 100 million of data, the data are fetching after
a few minutes, But if the data is 300 million the Solr is not responding. My
server memory is 100 GB.

FW: solrj question

2018-04-01T22:50:00.018-07:00

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: 26 March 2018 23:06
To: solr-user@lucene.apache.org
Subject: Re: solrj question

On 3/26/2018 11:19 AM, Webster Homer wrote:
> You may say that the String in the constructor is "meant to be query
> syntax", nothing in the Javadoc says anything about the expected syntax.
> Since there is also a method to set the query, it seemed reasonable to
> expect that it would take the output of the toString method. (or some
> other serialization method)

You're right that the javadoc is not very specific. It says this:

Parameters:
q - query string

In general in Solr, "query string" is understood to be something you would
put in the "q" parameter when you send a query. Or maybe the "fq"
parameter. The javadoc could definitely be improved.

The javadoc for the toString specifically used here is a little more
specific. (SolrQuery inherits from SolrParams, and that's where the
toString method is defined):

https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/common/param
s/SolrParams.html#toString--

It says "so that the URL may be unambiguously pasted back into a browser."

> So how would a user play back logged queries? This seems like an
> important use case. I can parse the toString output, It seems like the
> constructor should be able to take it.
> If not a constructor and toString, methods, I don't see methods to
> serialize and deserialize the query Being able to write the complete
> query to a log is important, but we also want to be able to read the
> log and submit the query to solr. Being able to playback the logs
> allows us to trouble shoot search issues on our site. It also
> provides a way to create load tests.
>
> Yes I can and am going to create this functionality, it's not that
> complicated, but I don't think it's unreasonable to think that the
> existing API should handle it.

Yes, that would be great capability to have. But it hasn't been written
yet. A method like "parseUrlString" on SolrQuery would be a good thing to
have.

Thanks,
Shawn

FW: Default Index config

2018-04-01T22:50:00.017-07:00

-----Original Message-----
From: mganeshs [mailto:mganeshs@live.in]
Sent: 26 March 2018 22:15
To: solr-user@lucene.apache.org
Subject: Default Index config

Hi,

I haven't changed the solr config wrt index config, which means it's all
commented in the solrconfig.xml.

It's something like what I pasted before. But I would like to know whats the
default value of each of this.

Coz.. after loading to 6.5.1 and our document size also crossed 5GB in each
of our collection. Now update of document is taking time. So would like to
know whether we need to change any default configurations.

<indexConfig>

<lockType>${solr.lock.type:native}</lockType>

</indexConfig>

Advice...

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: querying vs. highlighting: complete freedom?

2018-04-01T22:50:00.016-07:00

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 26 March 2018 22:05
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: querying vs. highlighting: complete freedom?

Arturas:

Thanks for the "atta boy's", but I have to confess I poked a developer's
list and the person (David Smiley) who, you know, like understands the
highlighting code replied, and I passed it on ;

I have great respect for the SO forum, but don't post to it since there's
only so much time in a day, so please feel free to put that explanation over
there.

As for the rest, I'll have to pass today, the aforementioned time
constraints are calling....

Best,
Erick

On Mon, Mar 26, 2018 at 12:12 AM, Arturas Mazeika <mazeika@gmail.com> wrote:
> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what
> I mean when I supply a keyword without the field-qualifier. Very
impressive.
> Would you care (re)posting this answer to stackoverflow? If that is
> too much of a hassle, I'll do this in a couple of days myself on your
behalf.
>
> I am impressed how well, thorough, fast and fully the question was
answered.
>
> Steven hint pushed me into this direction further: he suggested to use
> the query part of solr to filter and sort out the relevant answers in
> the 1st step and in the 2nd step he'd highlight all the keywords using
> CTR+F (in the browser or some alternative viewer). This brought be to
> the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to
> achieve this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the
> query 2. Use the http://localhost:8983/solr/#/trans/analysis to
> re-analyze the document and the query 3. Use the matching of the
> substrings from the original text to last filter/tokenizer/analyzer in
> the analyze-chain to map the terms of the query 4. Emulate CTRL+F
> highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal.
> If one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955)
> was a German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside
> quantum mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=x
> ml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14
> %20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-bo
> rn%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%
> 20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics
> %20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theo
> ry&analysis.fieldtype=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core
> that receives the request is not necessarily the one that processes
> it? Or this already is distributed in a sense that receiving core and
> processing cores are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store
> this information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson
> <erickerickson@gmail.com>
> wrote:
>
>> Arturas:
>>
>> Try to field-qualify your hl.q parameter. That looks like:
>>
>> hl.q=trans:Kundigung
>> or
>> hl.q=trans:Kündigung
>>
>> I saw the exact behavior you describe when I did _not_ specify the
>> field in the hl.q parameter, i.e.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> didn't show all highlights.
>>
>> But when I did specify the field, it worked.
>>
>> Here's what I think is happening: Solr uses the default search field
>> when parsing an un-field-qualified query. I.e.
>>
>> q=something
>>
>> is parsed as
>>
>> q=default_search_field:something.
>>
>> The default field is controlled in solrconfig.xml with the "df"
>> parameter, you'll see entries like:
>> <str name="df">my_field</str>
>>
>> Also when I changed the "df" parameter to the field I was
>> highlighting on, I didn't need to specify the field on the hl.q
parameter.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> The default field is usually "text", which knows nothing about the
>> German-specific filters you've applied unless you changed it.
>>
>> So in the absence of a field-qualification for the hl.q parameter
>> Solr was parsing the query according to the analysis chain specifed
>> in your default field, and probably passed ü through without
>> transforming it. Since your indexing analysis chain for that field
>> folded ü to just plain u, it wasn't found or highlighted.
>>
>> On the surface, this does seem like something that should be changed,
>> I'll go ahead and ping the dev list.
>>
>> NOTE: I was trying this on Solr 7.1
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <mazeika@gmail.com>
>> wrote:
>> > Hi Erick,
>> >
>> > Thanks for the update and the infos. Your post brought quite a bit
>> > of
>> light
>> > into the picture and now I understand quite a bit more about what
>> > you are saying. Your explanation makes sense and can be quite
>> > useful in certain scenarious.
>> >
>> > What stroke me from your description is that you are saying that
>> > the analyzer-chain needs to be applied for the highlighting queries as
well.
>> > The tragedy is that I am not able to get this for a german
>> > collection: if the query is set (no explicit highlighting query),
>> > the highlighting is correct. It is also correct, if I replace the
>> > umaults into the corresponding latin chars. Getting the analyzer
>> > chain for the
>> highlighting
>> > terms remains the challenge.
>> >
>> > Do you think you have a look at the following stakoverflow link?
>> > Maybe something comes to your mind...
>> >
>> > *https://stackoverflow.com/questions/49276093/solr-
>> highlighting-terms-with-umlaut-not-found-not-highlighted
>> > <https://stackoverflow.com/questions/49276093/solr-
>> highlighting-terms-with-umlaut-not-found-not-highlighted>*
>> >
>> > *Cheers,*
>> >
>> > *Arturas*
>> > On Fri, Mar 23, 2018, 17:43 Erick Erickson
>> > <erickerickson@gmail.com>
>> wrote:
>> >
>> >> bq: this is not a typical case that one searches for a keyword but
>> >> highlights something else
>> >>
>> >> This isn't really an unusual case, apparently I mislead you.
>> >>
>> >> What I was trying to convey is that the analysis chain used is
>> >> firmly attached to a particular _field_. There's no way to say
>> >> "use one analysis chain for the query and another for highlighting
>> >> on the _same_ field".
>> >>
>> >> You can use two different fields with different analysis chains,
>> >> one for each purpose. So something like
>> >>
>> >> q=f1:something&hl.fl=f2,f3&hl.q=other
>> >>
>> >> is certainly reasonable. It'll search for "something" in f1, and
>> >> highlight "other" in f2 and f3
>> >>
>> >> Each fields processes its input with the analysis chain defined in
>> >> the schema.
>> >>
>> >> The rest about stored="true" can be ignored, it's just me
>> >> wandering off into the weeds about an optimization that only
>> >> stores the data once rather than redundantly in multiple fields.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika
>> >> <mazeika@gmail.com>
>> >> wrote:
>> >> > Hi Mathesis (Stefan),
>> >> >
>> >> > Thanks for the questions. This made me look at the problem from
>> >> > a
>> >> distance
>> >> > and re-frame the situation. Good questions indeed.
>> >> >
>> >> > Trying to go around: consider a user who describes herself as
>> >> > being a
>> BMW
>> >> > fan, being convinced that all BMW need to be the blackest color
>> possible
>> >> > (for a sake of argument) who would like to search and later
>> >> > browse the entries in the discussion forum (of course not
>> >> > everything but BMW of
>> the
>> >> > blackest color), and what interest her are the snippets that
>> >> > have understood, craziest as keywords or the like (because she
>> >> > is looking
>> for
>> >> a
>> >> > dozen of discussions that she saw before).
>> >> >
>> >> > What I was not able to achieve so far is: (i) combine query term
>> >> > for filtering and highlighting, (ii) using the analyzer-chain
>> >> > from the attribute to rewrite the highlight query (or define one
>> >> > in the search)
>> >> >
>> >> > CTR+F technique is a very powerful one, indeed. Works most of
>> >> > CTR+the
>> time.
>> >> The
>> >> > difficulties with it are query rewriting, enriching, etc.
>> >> >
>> >> > Cheers,
>> >> > Arturas
>> >> >
>> >> > On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <
>> >> matheis.stefan@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Perhaps we try it the other way round .. what's your use case
>> >> >> for
>> this?
>> >> I'm
>> >> >> trying to think of a situation where I'd need this a as user?
>> >> >>
>> >> >> The only reason I see myself doing this is CTRL+F in a page
>> >> >> when the
>> >> search
>> >> >> result is not immediately visible for me ;)
>> >> >>
>> >> >> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <mazeika@gmail.com>
>> wrote:
>> >> >>
>> >> >> > Hi Erick et al,
>> >> >> >
>> >> >> > From your answer I understand that this is not a typical case
>> >> >> > that
>> one
>> >> >> > searches for a keyword but highlights something else. Since
>> >> >> > we have
>> >> two
>> >> >> > parameters (q vs hl.q) I thought they are freely combinable.
>> >> >> > From
>> your
>> >> >> > answer I understand that this is not really the case. My
>> >> >> > current understanding came from [1] that says:
>> >> >> >
>> >> >> > hl.q
>> >> >> >
>> >> >> > A query to use for highlighting. This parameter allows you to
>> >> highlight
>> >> >> > different terms than those being used to retrieve documents.
>> >> >> > what I hear from you is something different: i.e., that this
>> >> >> > is not
>> >> >> enough
>> >> >> > just to combine the q with hl.q, that there are caveats to
>> >> >> > achieve
>> the
>> >> >> task
>> >> >> > (multiple fields, FastVectorHighlighter).
>> >> >> >
>> >> >> > Your infos are very helpful.
>> >> >> >
>> >> >> > Cheers,
>> >> >> > Arturas
>> >> >> >
>> >> >> > [1]
>> >> >> > https://lucene.apache.org/solr/guide/7_2/highlighting.html
>> >> >> >
>> >> >> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <
>> >> erickerickson@gmail.com
>> >> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Basically you need to use a copyField, but in several variants:
>> >> >> > >
>> >> >> > > If you use the field _exclusively_ for highlighting then
>> >> >> > > store
>> the
>> >> raw
>> >> >> > > content there and have the field use whatever analyzer you
want.
>> You
>> >> >> > > do _not_ need to have indexed="true" set for the field if
>> >> >> > > you're highlighting on the fly. So you're searching against
>> >> >> > > field1
>> (which
>> >> has
>> >> >> > > indexed="true" stored="false" set) but highlighting against
>> field2
>> >> >> > > (which has indexed="false" stored="true" set). Of course
>> >> >> > > any time
>> >> you
>> >> >> > > want to return the contents in a doc your fl needs to
>> >> >> > > specify field2...
>> >> >> > >
>> >> >> > > The above does not bloat your index at all since the cost
>> >> >> > > of stored="true" indexed="true" is the same as if you use
>> >> >> > > two
>> fields,
>> >> >> > > each with only one option turned on.
>> >> >> > >
>> >> >> > > The second approach if you want to use
>> >> >> > > FastVectorHighlighter or
>> the
>> >> >> > > like is simply to index both fields.
>> >> >> > >
>> >> >> > > Best,
>> >> >> > > Erick
>> >> >> > >
>> >> >> > > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <
>> mazeika@gmail.com
>> >> >
>> >> >> > > wrote:
>> >> >> > > > Hi Solr-Users,
>> >> >> > > >
>> >> >> > > > I've been playing with a german collection of documents,
>> >> >> > > > where
>> I
>> >> >> tried
>> >> >> > to
>> >> >> > > > search for one word (q=Tag) and highlighted another:
>> >> >> (hl.q=Kundigung).
>> >> >> > Is
>> >> >> > > > this a "legal" use case? My key question is how can I
>> >> >> > > > tell solr
>> >> which
>> >> >> > > query
>> >> >> > > > analyzer to use for highlighting? Strictly speaking, I
>> >> >> > > > should
>> use
>> >> >> > > > hl.q=Kündigung to conceptually look for relevant
>> >> >> > > > information,
>> but
>> >> in
>> >> >> > this
>> >> >> > > > case, no highlighting is returned (as all umlauts are
>> >> >> > > > left out
>> in
>> >> the
>> >> >> > > > index) .
>> >> >> > > >
>> >> >> > > > Additional infos:
>> >> >> > > >
>> >> >> > > > solr version: 7.2
>> >> >> > > > urls to query:
>> >> >> > > >
>> >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>> >> >> > > >
>> >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=K%C3%BCndigung&hl.snippets=3&wt=xml&r
>> >> >> > > ows=1
>> >> >> > > > <http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > Managed-schema:
>> >> >> > > >
>> >> >> > > > <fieldType name="text_de" class="solr.TextField"
>> >> >> > > positionIncrementGap="100">
>> >> >> > > > <analyzer>
>> >> >> > > > <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> >> > > > <filter class="solr.LowerCaseFilterFactory"/>
>> >> >> > > > <filter class="solr.StopFilterFactory"
format="snowball"
>> >> >> > > > words="lang/stopwords_de.txt" ignoreCase="true"/>
>> >> >> > > > <filter class="solr.GermanNormalizationFilterFactory"/>
>> >> >> > > > <filter class="solr.GermanLightStemFilterFactory"/>
>> >> >> > > > </analyzer>
>> >> >> > > > </fieldType>
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > Other additional infos:
>> >> >> > > > https://stackoverflow.com/questions/49276093/solr-
>> >> >> > > highlighting-terms-with-umlaut-not-found-not-highlighted
>> >> >> > > >
>> >> >> > > > Cheers,
>> >> >> > > > Arturas
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

FW: Boosting Fields Based On The Query Provided

2018-04-01T22:50:00.015-07:00

-----Original Message-----
From: Mukhopadhyay, Aratrika [mailto:Aratrika.Mukhopadhyay@mail.house.gov]
Sent: 22 March 2018 18:48
To: solr-user@lucene.apache.org
Subject: RE: Boosting Fields Based On The Query Provided

Thanks for your reply Shawn. The query elevation worked for us. I have
another question though. Right now I have ways to handle specific queries in
the elevate.xml. The concern I am having is that I may have hundreds of
queries that need to return different pages first. Is the only way to do
this via the elevate.xml or is there a better approach for instance boosting
fields ? When I am boosting fields in this fashion it is not working for me
:

<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf"> url^50 host^30 content^20 title^10</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>

Thanks for your help .

Aratrika Mukhopadhyay
-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: Tuesday, March 20, 2018 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting Fields Based On The Query Provided

On 3/20/2018 2:25 PM, Mukhopadhyay, Aratrika wrote:
> I have a solr query which I am having a hard time configuring as I
would want it configured. Suppose I have a situation where I have two fields
field1(host field) and field2 (url field). I want a specific host to be
bubbled to the top for all terms except for when I am searching for specific
people in which case I want the URL to their landing page returned first. I
have configured the dismax query parser in my solrconfig but it seems that
the boost being applied is arbitrary .

<snip>

> <requestHandler name="/select" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="q">*:*</str>
> <str
name="bq">host:(www.starwars.com)^10</str<https://urldefense.proofpoint.com/
v2/url?u=http-3A__www.starwars.com-29-255e10-253c_str&d=DwID-g&c=L93KkjKsAC9
8uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237
NoEoCTMyiD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=QkEn
Izj19X_nqC298QkAUDbjv_zmP1Xr9Vn_z6BQXoM&e= >>
> <str name="q">Carrie Fisher</str>
> <str name="bq">url:(
http\:\/\/www.imdb.com\/name\/nm0000402/<https://urldefense.proofpoint.com/v
2/url?u=http-3A__www.imdb.com_name_nm0000402_&d=DwID-g&c=L93KkjKsAC98uTvC4Kv
QDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMy
iD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=ka0--rJJLml1
zFZu_P1xPisIiLpXR5LwsIMm82TuoUk&e= >)^8</str>
> <str name="q">Mark Hamill</str>
> <str name="bq">url:(
http\:\/\/www.imdb.com\/name\/nm0000434/<https://urldefense.proofpoint.com/v
2/url?u=http-3A__www.imdb.com_name_nm0000434_&d=DwID-g&c=L93KkjKsAC98uTvC4Kv
QDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMy
iD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=sVRAvHN3kkJV
jN8XTAWjZS85tUmOXp9W4exWvMPpGUk&e= >)^8</str>
> </lst>
> </requestHandler>

I think there's a fundamental misunderstanding of how "defaults" works.

I have no idea what happens with multiple "q" parameters, which you have
configured in defaults. I do know that if your request includes a "q"
parameter, then what you've put in defaults for "q" is going to be
overridden and ignored.

This section of the documentation covers defaults, appends, and invariants:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_
guide_6-5F6_requesthandlers-2Dand-2Dsearchcomponents-2Din-2Dsolrconfig.html-
23RequestHandlersandSearchComponentsinSolrConfig-2DSearchHandlers&d=DwID-g&c
=L93KkjKsAC98uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWE
y5WPCDMJD237NoEoCTMyiD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegn
w_nb0&s=LcNEhj3Y-S5KMW2HP0CG9t9UpRgEVsTcP7u8QgqW3tk&e=

I think the Query Elevation Component might be the kind of functionality
you're after. What you're trying to do with defaults is NOT going to work.

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_
guide_6-5F6_the-2Dquery-2Delevation-2Dcomponent.html&d=DwID-g&c=L93KkjKsAC98
uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237N
oEoCTMyiD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=I6TEN
NcAZab0ZE_j0tZ8hm8_7nuNFqhBwoey4Mm1T0E&e=

Thanks,
Shawn

FW: Solr or Elasticsearch

2018-04-01T22:50:00.014-07:00

-----Original Message-----
From: Steven White [mailto:swhite4141@gmail.com]
Sent: 22 March 2018 18:44
To: solr-user@lucene.apache.org
Subject: Solr or Elasticsearch

Hi everyone,

There are some good write ups on the internet comparing the two and the one
thing that keeps coming up about Elasticsearch being superior to Solr is
it's analytic capability. However, I cannot find what those analytic
capabilities are and why they cannot be done using Solr. Can someone help
me with this question?

Personally, I'm a Solr user and the thing that concerns me about
Elasticsearch is the fact that it is owned by a company that can any day
decide to stop making Elasticsearch avaialble under Apache license and even
completely close free access to it.

So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done
using Solr? I want to see a complete list if possible.
2) Should an Elasticsearch user be worried that Elasticsearch may close it's
open-source policy at anytime or that outsiders have no say about it's road
map?

Thanks,

Steve

FW: querying vs. highlighting: complete freedom?

2018-04-01T22:50:00.013-07:00

-----Original Message-----
From: Arturas Mazeika [mailto:mazeika@gmail.com]
Sent: 22 March 2018 14:48
To: solr-user@lucene.apache.org
Subject: querying vs. highlighting: complete freedom?

Hi Solr-Users,

I've been playing with a german collection of documents, where I tried to
search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
this a "legal" use case? My key question is how can I tell solr which query
analyzer to use for highlighting? Strictly speaking, I should use
hl.q=Kündigung to conceptually look for relevant information, but in this
case, no highlighting is returned (as all umlauts are left out in the
index) .

Additional infos:

solr version: 7.2
urls to query:

http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.
q=Kundigung&hl.snippets=3&wt=xml&rows=1

http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.
q=K%C3%BCndigung&hl.snippets=3&wt=xml&rows=1
<http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl
.q=Kundigung&hl.snippets=3&wt=xml&rows=1>

Managed-schema:

<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" format="snowball"
words="lang/stopwords_de.txt" ignoreCase="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.GermanLightStemFilterFactory"/>
</analyzer>
</fieldType>

Other additional infos:
https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-um
laut-not-found-not-highlighted

Cheers,
Arturas

FW: Get terms in solr not working

2018-04-01T22:50:00.012-07:00

-----Original Message-----
From: Joel Bernstein [mailto:joelsolr@gmail.com]
Sent: 21 March 2018 20:51
To: solr-user@lucene.apache.org
Subject: Re: Get terms in solr not working

Also what is the use case? What do you plan to do with terms? There may be
other approaches that will work better then the terms query.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Mar 21, 2018 at 9:28 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> We need a lot more information. What is the exact query you're using?
> Is 100M the number of docs? How many terms are in the field?
>
> On Tue, Mar 20, 2018 at 10:39 PM, adam rag <adamrag16@gmail.com> wrote:
> > To get top words in my Apache Solr instance, I am using "terms" query.
> When
> > I try it to get 10 terms in 100 million of data, the data are
> > fetching after a few minutes, But if the data is 300 million the
> > Solr is not responding. My server memory is 100 GB.
>

FW: Upgrading a Plugin from 6.6 to 7.x

2018-04-01T22:50:00.011-07:00

-----Original Message-----
From: Atita Arora [mailto:atitaarora@gmail.com]
Sent: 21 March 2018 19:01
To: solr-user@lucene.apache.org
Subject: Re: Upgrading a Plugin from 6.6 to 7.x

Hi Peter,

*(Sorry for the earlier incomplete email - I hit send by mistake)*

I haven't really been able to look into it completely , but my first glance
says , it should be because the method signature has changed.

Iam looking here : https://lucene.apache.org/core/7_0_0/core/org/apache/
lucene/search/Query.html

createWeight
<https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/Query.ht
ml#createWeight-org.apache.lucene.search.IndexSearcher-boolean-float->
(IndexSearcher
<https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/IndexSea
rcher.html>
searcher, boolean needsScores, float boost)
Expert: Constructs an appropriate Weight implementation for this query.

While at :

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/Query.htm
l

createWeight
<https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/Query.ht
ml#createWeight-org.apache.lucene.search.IndexSearcher-boolean->
(IndexSearcher
<https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/IndexSea
rcher.html>
searcher,
boolean needsScores)
Expert: Constructs an appropriate Weight implementation for this query.

You would need a code change for this to make it work in Version 7.

Thanks,
Atita

On Wed, Mar 21, 2018 at 6:59 PM, Atita Arora <atitaarora@gmail.com> wrote:

> Hi Peter,
>
> I haven't really been able to look into it completely , but my first
> glance says , it should be because the method signature has changed.
>
> Iam looking here :
> https://lucene.apache.org/core/7_0_0/core/org/apache/
> lucene/search/Query.html
>
> createWeight
> <https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/Qu
> ery.html#createWeight-org.apache.lucene.search.IndexSearcher-boolean-f
> loat->
> (IndexSearcher
> <https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/In
> dexSearcher.html> searcher, boolean needsScores, float boost)
> Expert: Constructs an appropriate Weight implementation for this query.
>
> While at :
>
>
> On Wed, Mar 21, 2018 at 4:16 PM, Peter Alexander Kopciak
> <peter@kopciak.at
> > wrote:
>
>> Hi!
>>
>> I'm still pretty new to Solr and I want to use the vector Scoring
>> plugin (
>> https://github.com/saaay71/solr-vector-scoring/network) but
>> unfortunately, it does not seem to work for newer Solr versions.
>>
>> I tested it with 6.6 to verify its functionality, so it seems to be
>> broken because of the upgrade to 7.x.
>>
>> When following the installation procedure and executing the examples,
>> I ran into the following error with Query 1:
>>
>> java.lang.UnsupportedOperationException: Query {! type=vp f=vector
>> vector=0.1,4.75,0.3,1.2,0.7,4.0 v=} does not implement createWeight
>>
>> Does anyone has a lead for me how to fix/upgrade the plugin? The
>> createWeight method seems to exist, so I'm not sure where to start
>> and waht the problem seems to be.
>>
>
>

FW: Solr main replica down, another replica taking over

2018-04-01T22:50:00.010-07:00

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: 21 March 2018 18:37
To: solr-user@lucene.apache.org
Subject: Re: Solr main replica down, another replica taking over

On 3/21/2018 12:04 AM, Midas A wrote:
> We want to send less traffic over virtual machines and more on
> physical servers . How can we achieve this

At the moment, I do not know of any functionality in SolrCloud to accomplish
this goal. As I mentioned before, there is work underway to make it
possible, but it's not available yet.

One thing you could do is include preferLocalShards=true as a URL parameter
and only send requests to the physical servers (unless they are down), but
to do that, you'll have to handle load balancing yourself.

Thanks,
Shawn

FW: [PHP Classes] Notable PHP package: PHP DNS Check Tool

2018-04-01T22:50:00.009-07:00

From: PHP Classes Notable [mailto:list-notable@phpclasses.org]
Sent: 21 March 2018 12:22
To: ROSHAN <roshan@siddhast.com>
Subject: [PHP Classes] Notable PHP package: PHP DNS Check Tool

A DNS is a server hosted in the Internet that can return IP addresses of other computers also on the Internet. Often computers need to query different DNS servers to obtain the IP addresses of same computers, but since the information may not be synchronized, there may be differences between the record values. This package can determine if there are differences between the values of given records stored in different DNS servers.

Notable PHP package: PHP DNS Check Tool

Replay Real User Sessions

Monitor and Replay what Real Users do on your Website or Web app

ROSHAN, a PHP package is considered Notable when it does something different that is worth noting.

If you have also written Notable packages, contribute them to the PHP Classes site to get your work more exposure.

If your notable package is innovative, you may also earn prizes and recognition in the PHP Innovation Award.

Now you can also win a Big elePHPant as one of the possible prizes you can win every month. Check the complete list of prizes here: List of prizes

Package

PHP DNS Check Tool

Check DNS records and compare record sets

Moderator comment

A DNS is a server hosted in the Internet that can return IP addresses of other computers also on the Internet.

Often computers need to query different DNS servers to obtain the IP addresses of same computers, but since the information may not be synchronized, there may be differences between the record values.

This package can determine if there are differences between the values of given records stored in different DNS servers.

Author

Matous Nemec

Groups

Networking, PHP 7

Description

This class can check DNS records and compare record sets.

It can perform lookups to DNS servers to obtain the values of record for certain domains and of certain record types.

The class can also compare sets of records obtained from different providers like DNS servers or arrays to determine the differences and see what changed.

ROSHAN you are getting this message as free service for being a user of the PHP Classes site to which you registered voluntarily using the email address roshan@siddhast.com. If you wish to unsubscribe go to the unsubscribe page.

FW: Solrj Analytics component

2018-04-01T22:50:00.008-07:00

-----Original Message-----
From: Jason Gerlowski [mailto:gerlowskija@gmail.com]
Sent: 21 March 2018 04:07
To: solr-user@lucene.apache.org
Subject: Re: Solrj Analytics component

Hi Asmaa,

As far as I know, there aren't any SolrJ classes built expressly for
Analytics component requests like what exists for the Collection Admin APIs,
etc.
(https://lucene.apache.org/solr/7_2_0/solr-solrj/org/apache/solr/client/solr
j/request/CollectionAdminRequest.html).
But it should still be possible to package your request into a SolrRequest
via some of the setters on that object, and parse the response out of the
returned NamedList<Object>.

It isn't pretty, but it _should_ be possible. Was there a more specific
aspect of building the request that you were getting hung up on?

Best of luck,

Jason

On Fri, Mar 16, 2018 at 4:38 PM, Asmaa Shoala <asmaa.shoala@nm-eg.com>
wrote:
> Hello,
>
> I want to use analytics
component(https://lucene.apache.org/solr/guide/7_2/analytics.html#analytic-p
ivot-facets) in java code but i didn't find any guide over the internet .
>
> Can you please help me?
>
> Thanks,
>
> Asmaa Ramzy Shoala
>
> novomind Egypt LLC
> _____________________________
>
> 7 Abou Rafea Street, Moustafa Kamel, Alexandria, Egypt
>
> Mobile +20 1227281143
> email asmaa.shoala@nm-eg.com<mailto:asmaa.shoala@nm-eg.com> . Skype
> asmaa.shoala_nmeg
>

FW: Boosting Fields Based On The Query Provided

2018-04-01T22:50:00.007-07:00

-----Original Message-----
From: Mukhopadhyay, Aratrika [mailto:Aratrika.Mukhopadhyay@mail.house.gov]
Sent: 21 March 2018 01:56
To: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
Subject: Boosting Fields Based On The Query Provided

All ,
I have a solr query which I am having a hard time configuring as I
would want it configured. Suppose I have a situation where I have two fields
field1(host field) and field2 (url field). I want a specific host to be
bubbled to the top for all terms except for when I am searching for specific
people in which case I want the URL to their landing page returned first. I
have configured the dismax query parser in my solrconfig but it seems that
the boost being applied is arbitrary .

To be more specific if I search for terms related to star wars I want to
boost the starwars.com domain but if I search for Carrie Fisher or Mark
Hamill I want to boost the url http://www.imdb.com/name/nm0000402/ (Carrie
Fisher's imdb page) to the top for Carrie fisher and the url
http://www.imdb.com/name/nm0000434/ (Mark Hamill's imdb page) to the top for
Mark Hamill . Here would be my current configuration which is not working .

<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="q">*:*</str>
<str
name="bq">host:(www.starwars.com)^10</str<http://www.starwars.com)%5e10%3c/s
tr>>
<str name="q">Carrie Fisher</str>
<str name="bq">url:(
http\:\/\/www.imdb.com\/name\/nm0000402/<http://www.imdb.com/name/nm0000402/
>)^8</str>
<str name="q">Mark Hamill</str>
<str name="bq">url:(
http\:\/\/www.imdb.com\/name\/nm0000434/<http://www.imdb.com/name/nm0000434/
>)^8</str>
</lst>
</requestHandler>

Do any of you know how to best handle a case like this ?

Regards,
Aratrika Mukhopadhyay

FW: CDCR Invalid Number on deletes

2018-04-01T22:50:00.006-07:00

-----Original Message-----
From: Amrit Sarkar [mailto:sarkaramrit2@gmail.com]
Sent: 21 March 2018 01:20
To: solr-user@lucene.apache.org
Subject: Re: CDCR Invalid Number on deletes

Hi Chris,

Sorry I was off work for few days and didn't follow the conversation. The
link is directing me to
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063. I think we
have fixed the issue stated by you in the jira, though the symptoms were
different than yours.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 21, 2018 at 1:17 AM, Chris Troullis <cptroullis@gmail.com>
wrote:

> Nevermind I found it....the link you posted links me to SOLR-12036
> instead of SOLR-12063 for some reason.
>
> On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis <cptroullis@gmail.com>
> wrote:
>
> > Hey Amrit,
> >
> > Did you happen to see my last reply? Is SOLR-12036 the correct JIRA?
> >
> > Thanks,
> >
> > Chris
> >
> > On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis
> > <cptroullis@gmail.com>
> > wrote:
> >
> >> Hey Amrit, thanks for the reply!
> >>
> >> I checked out SOLR-12036, but it doesn't look like it has to do
> >> with CDCR, and the patch that is attached doesn't look CDCR
> >> related. Are you sure that's the correct JIRA number?
> >>
> >> Thanks,
> >>
> >> Chris
> >>
> >> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar
> >> <sarkaramrit2@gmail.com>
> >> wrote:
> >>
> >>> Hey Chris,
> >>>
> >>> I figured a separate issue while working on CDCR which may relate
> >>> to
> your
> >>> problem. Please see jira: *SOLR-12063*
> >>> <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>.
> >>> This is a bug got introduced when we supported the bidirectional
> >>> approach where
> an
> >>> extra flag in tlog entry for cdcr is added.
> >>>
> >>> This part of the code is messing up:
> >>> *UpdateLog.java.RecentUpdates::update()::*
> >>>
> >>> switch (oper) {
> >>> case UpdateLog.ADD:
> >>> case UpdateLog.UPDATE_INPLACE:
> >>> case UpdateLog.DELETE:
> >>> case UpdateLog.DELETE_BY_QUERY:
> >>> Update update = new Update();
> >>> update.log = oldLog;
> >>> update.pointer = reader.position();
> >>> update.version = version;
> >>>
> >>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
> >>> update.previousVersion = (Long)
> >>> entry.get(UpdateLog.PREV_VERSI ON_IDX);
> >>> }
> >>> updatesForLog.add(update);
> >>> updates.put(version, update);
> >>>
> >>> if (oper == UpdateLog.DELETE_BY_QUERY) {
> >>> deleteByQueryList.add(update);
> >>> } else if (oper == UpdateLog.DELETE) {
> >>> deleteList.add(new DeleteUpdate(version,
> >>> (byte[])entry.get(entry.size()-1)));
> >>> }
> >>>
> >>> break;
> >>>
> >>> case UpdateLog.COMMIT:
> >>> break;
> >>> default:
> >>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> >>> "Unknown Operation! " + oper);
> >>> }
> >>>
> >>> deleteList.add(new DeleteUpdate(version,
> >>> (byte[])entry.get(entry.size() -1)));
> >>>
> >>> is expecting the last entry to be the payload, but everywhere in
> >>> the project, *pos:[2] *is the index for the payload, while the
> >>> last entry
> in
> >>> source code is *boolean* in / after Solr 7.2, denoting update is
> >>> cdcr forwarded or typical. UpdateLog.java.RecentUpdates is used to
> >>> in cdcr sync, checkpoint operations and hence it is a legit bug,
> >>> slipped the tests I wrote.
> >>>
> >>> The immediate fix patch is uploaded and I am awaiting feedback on
that.
> >>> Meanwhile if it is possible for you to apply the patch, build the
> >>> jar
> and
> >>> try it out, please do and let us know.
> >>>
> >>> For, *SOLR-9394*
> >>> <https://issues.apache.org/jira/browse/SOLR-9394>, if you can
> >>> comment on the JIRA and post the sample docs, solr logs, relevant
> >>> information, I can give it a thorough look.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>> Medium: https://medium.com/@sarkaramrit2
> >>>
> >>> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis
> >>> <cptroullis@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > We recently upgraded to Solr 7.2.0 as we saw that there were
> >>> > some
> CDCR
> >>> bug
> >>> > fixes and features added that would finally let us be able to
> >>> > make
> use
> >>> of
> >>> > it (bi-directional syncing was the big one). The first time we
> >>> > tried
> to
> >>> > implement we ran into all kinds of errors, but this time we were
> >>> > able
> >>> to
> >>> > get it mostly working.
> >>> >
> >>> > The issue we seem to be having now is that any time a document
> >>> > is
> >>> deleted
> >>> > via deleteById from a collection on the primary node, we are
> >>> > flooded
> >>> with
> >>> > "Invalid Number" errors followed by a random sequence of
> >>> > characters
> >>> when
> >>> > CDCR tries to sync the update to the backup site. This happens
> >>> > on all
> >>> of
> >>> > our collections where our id fields are defined as longs (some
> >>> > of
> them
> >>> the
> >>> > ids are compound keys and are strings).
> >>> >
> >>> > Here's a sample exception:
> >>> >
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException:
> >>> Error
> >>> > from server at http://ip/solr/collection_shard1_replica_n1:
> >>> > Invalid
> >>> > Number: ]
> >>> > -s
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > directUpdate(CloudSolrClient.java:549)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > sendRequest(CloudSolrClient.java:1012)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:883)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> >>> > at
> >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> >>> > CloudSolrClient.java:816)
> >>> > at
> >>> > org.apache.solr.client.solrj.SolrRequest.process(
> SolrRequest.java:194)
> >>> > at
> >>> > org.apache.solr.client.solrj.SolrRequest.process(
> SolrRequest.java:211)
> >>> > at
> >>> > org.apache.solr.handler.CdcrReplicator.sendRequest(
> >>> > CdcrReplicator.java:140)
> >>> > at
> >>> > org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104)
> >>> > at
> >>> > org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(
> >>> > CdcrReplicatorScheduler.java:81)
> >>> > at
> >>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> >>> > lambda$execute$0(ExecutorUtil.java:188)
> >>> > at
> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> >>> > ThreadPoolExecutor.java:1149)
> >>> > at
> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >>> > ThreadPoolExecutor.java:624)
> >>> > at java.lang.Thread.run(Thread.java:748)
> >>> >
> >>> >
> >>> > I'm scratching my head as to the cause of this. It's like it is
> trying
> >>> to
> >>> > deleteById for the value "]", even though that is not the ID for
> >>> > the document that was deleted from the primary. So I don't know
> >>> > if it is pulling this from the wrong field somehow or where that
> >>> > value if
> coming
> >>> > from.
> >>> >
> >>> > I found this issue:
> >>> > https://issues.apache.org/jira/browse/SOLR-9394
> >>> which
> >>> > looks related, but doesn't look like it has any traction.
> >>> >
> >>> > Has anyone else experienced this issue with CDCR, or have any
> >>> > ideas
> as
> >>> to
> >>> > what could be causing this issue?
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Chris
> >>> >
> >>>
> >>
> >>
> >
>

FW: Question liste solr

2018-04-01T22:50:00.005-07:00

-----Original Message-----
From: Rahul Singh [mailto:rahul.xavier.singh@gmail.com]
Sent: 20 March 2018 20:10
To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
Subject: RE: Question liste solr

Parallel processing in any way will help, including Spark w/ a DFS like S3
or HDFS. Your three machines could end up being a bottleneck and you may
need more nodes.

On Mar 20, 2018, 2:36 AM -0500, LOPEZ-CORTES Mariano-ext
<mariano.lopez-cortes-ext@pole-emploi.fr>, wrote:
> CSV file is 5GB aprox. for 29 millions.
>
> As you say Christopher, at the beggining we thougth that reading chunk
> by chunk from Oracle and writing to Solr was the best strategy.
>
> But, from our tests we've remarked:
>
> CSV creation via PL/SQL is really really fast. 40 minutes for the full
dataset (with bulk collect).
> Multiple SELECT calls from java slows down the process. I think Oracle is
the bottleneck here.
>
> Any other ideas/alternatives?
>
> Some other points to remark:
>
> We are going to enable autoCommit for every 10 minutes / 10000 rows. No
commit from client.
> During indexing, whe call all the time a front-end load-balancer that
redirect calls to the 3-node cluster.
>
> Thanks in advance!!
>
> ==>Great maillist and really awesome tool!!
>
> -----Message d'origine-----
> De : Christopher Schultz [mailto:chris@christopherschultz.net]
> Envoyé : lundi 19 mars 2018 18:05
> À : solr-user@lucene.apache.org
> Objet : Re: Question liste solr
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Mariano,
>
> On 3/19/18 11:50 AM, LOPEZ-CORTES Mariano-ext wrote:
> > Hello
> >
> > We have an index Solr with 3 nodes, 1 shard et 2 replicas.
> >
> > Our goal is to index 42 millions rows. Indexing time is important.
> > The data source is an oracle database.
> >
> > Our indexing strategy is :
> >
> > * Reading from Oracle to a big CSV file.
> >
> > * Reading from 4 files (big file chunked) and injection via
> > ConcurrentUpdateSolrClient
> >
> > Is it the optimal way of injecting such mass of data into Solr ?
> >
> > For information, estimated time for our solution is 6h.
>
> How big are the CSV files? If most of the time is taken performing the
various SELECT operations, then it's probably a good strategy.
>
> However, you may find that using the disk as a buffer slows everything
down because disk-writes can be very slow.
>
> Why not perform your SELECT(s) and write directly to Solr using one of the
APIs (either a language-specific API, or through the HTTP API)?
>
> Hope that helps,
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqv7aEdHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgJrg//RushznZlTg60TxdE
> s/XKK+69s9c0+DwZ/IrU366j2ZOcJl8Osu9TpzaCSEpdWuulFG8qCSYThTngaijH
> I02YCqnK9Ey4+6B7u9QECWNXjdlQXoeINjCnRLVENWzkSmht/U2nW3WTFEPKOvQ3
> 6ISTPATFnfo6Wt4VYrVefqO/yCCiR5bGL5LsSZYwvqlh9egR8K/wtf4sQ5kji3z+
> r2Z0gYpR9igE3ZCIByf6QGq0Ftku90oFCG+kCVNOdgfqwkUaMdc7krv92oTSH4o5
> BH+trc2jPf3HKFmp/ywRAPEhAfA5BwbT8vB9gwl/6vuT6efAot7xrLqduF3h7jG6
> ffPtkEBbD/ld3inIVta6/hnUwxX9O1fBtJrZegD14cezLV9QcEWFJ8/lUfgGOTdX
> ZuvwxBFhmCXE9EMWLlpdUOWK9iVBsZoQZxawoqw9xQauBp/Adg29fdeXmEkUssey
> 85HGDv/x33Bcr1xPGa8nOygWcZRUgGFCh871qStg9GeTNx3C/mSk0wxdKeUDRePg
> GEuL0p803yCJYAddyF66nnx676LfFeDaocBJelx5UbiteNT23xut7jWP/COyOvoy
> tpq3c9UfIkobgcA7bZ3IL2Og+hExgo+tLQXiOx6bf2TD1Jk2UOWWk1TAUspuUybD
> VH6PlwgqcrO28Jx799mJvpIotoE=
> =aMPk
> -----END PGP SIGNATURE-----