Siddhast lab: April 2018

Sunday, April 15, 2018

Starting services in safe mode

Recently I was trying to figure out how to start additional services in Windows safe mode. I had a user whose laptop kept crashing at login, I had a quick look and several theories came to mind but uptime was important, so as a temporary workaround I set it up in safe mode with networking.

A few days later the user calls and wants to be able to print in safe mode. I look into it, do some searching, but the prevailing wisdom seemed to be that it wasn't doable. This sounded like an MCP party line to me so I decide to explore the registry. Eventually I find theHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control key which has sub-keys of Minimal and Network. Minimal being safe mode, Network being safe mode with networking. It seems to be a whitelist of services, drivers and drive groups that are allowed to start or load.

Therefore it is possible to start additional services and load additional drivers in safe mode – just add a key for the service or driver short name, then a string for type. The below entry (if in a .reg file) would allow the Print Spooler to start in safe mode with networking.

1
2
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SafeBoot\Network\Spooler]
@="Service"

If you want a list of all drivers, driver groups and services starting in normal mode and their corresponding short names checkHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services

I'd caution against whitelisting too much as it kind of defeats the purpose of safe mode, though in certain situations as a quick hack it can useful. It may also be something worth checking the next time you're dealing with a particularly nasty malware infection. I haven't seen anything which exploits it yet, but I imagine something does.

http://www.krisdavidson.org/2010/09/11/starting-services-in-safe-mode/

Thursday, April 12, 2018

FW: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

-----Original Message-----
From: msaunier [mailto:msaunier@citya.com]
Sent: 09 April 2018 13:49
To: solr-user@lucene.apache.org
Subject: RE: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

I up my subject. Thanks

-----Message d'origine-----
De : msaunier [mailto:msaunier@citya.com] Envoyé : jeudi 5 avril 2018 10:46
À : solr-user@lucene.apache.org Objet : RE: ZKPropertiesWriter error DIH
(SolrCloud 6.6.1)

I have use this process to create the DIH :

1. Create the BLOB collection:
* curl
http://localhost:8983/solr/admin/collections?action=CREATE&name=.system

2. Send definition and file for DIH
* curl -X POST -H 'Content-Type: application/octet-stream' --data-binary
@ solr-dataimporthandler-6.6.1.jar
http://localhost:8983/solr/.system/blob/DataImportHandler
* curl -X POST -H 'Content-Type: application/octet-stream' --data-binary
@ mysql-connector-java-5.1.46.jar
http://localhost:8983/solr/.system/blob/MySQLConnector
* curl http://localhost:8983/solr/advertisements2/config -H
'Content-type:application/json' -d '{"add-runtimelib": {
"name":"DataImportHandler", "version":1 }}'
* curl http://localhost:8983/solr/advertisements2/config -H
'Content-type:application/json' -d '{"add-runtimelib": {
"name":"MySQLConnector", "version":1 }}'

3. I have add on the config file the requestHandler with the API. Result :
###
"/full-advertisements": {
"runtimeLib": true,
"version": 1,
"class": "org.apache.solr.handler.dataimport.DataImportHandler",
"defaults": {
"config": "DIH/advertisements.xml"
},
"name": "/full-advertisements"
},
###

4. I have add with the zkcli.sh script the .xml definition file in
/configs/advertisements2/DIH/advertisements.xml
###
<dataConfig>

<dataSource name="Gesloc" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://srv-gesloc-sql/TRANSACTIONCITYANEWLOCATION" user="ics"
password="******" />

<document>

<entity name="Advertisements_Gesloc" dataSource="Gesloc" pk="id"
transformer="TemplateTransformer" query="SELECT id,origin FROM
view_indexation_advertisements" >

<field column="id" name="id"/>
<field column="origin" name="origin"/>

</entity>

</document>

</dataConfig>
###

Thanks for your help.

-----Message d'origine-----
De : msaunier [mailto:msaunier@citya.com] Envoyé : mercredi 4 avril 2018
09:57 À : solr-user@lucene.apache.org Cc : fharrang@citya.com Objet :
ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

Hello,
I use Solr Cloud and I test DIH system in cloud, but I have this error :

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to PropertyWriter implementation:ZKPropertiesWriter at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:330)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport
er.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:326)
... 4 more

My DIH definition on the cloud

<dataConfig>

<dataSource name="Gesloc" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://srv-gesloc-sql/TRANSACTIONCITYANEWLOCATION" user="ics"
password="IcsPerms"
runtimeLib="true" version="1"/>

<document>

<entity name="Advertisements_Gesloc" dataSource="Gesloc" pk="id"
transformer="TemplateTransformer"
query="SELECT id,origin FROM view_indexation_advertisements" >

<field column="id" name="id"/>
<field column="origin" name="origin"/>

</entity>

</document>

</dataConfig>

Call response :

<http://localhost:8983/solr/advertisements2/full-advertisements?command=full
-import&clean=false&commit=true>
http://localhost:8983/solr/advertisements2/full-advertisements?command=full-
import&clean=false&commit=true

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<lst name="initArgs">
<bool name="runtimeLib">true</bool>
<long name="version">1</long>
<lst name="defaults">
<str name="config">DIH/advertisements.xml</str>
</lst>
</lst>
<str name="command">full-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages"/>
</response>

I don't understand why I have this error. Can you help me ?
Thanks you.

Monday, April 9, 2018

FW: Default Index config

-----Original Message-----
From: mganeshs [mailto:mganeshs@live.in]
Sent: 09 April 2018 15:34
To: solr-user@lucene.apache.org
Subject: Re: Default Index config

Hi Shawn,

Regarding CPU high, when we are troubleshooting, we found that Merge threads
are keep on running and it's take most CPU time ( as per Visual JVM ). GC is
not causing any issue as we use the default GC and also tried with G1 as you
suggested over here
<https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr>

Though it's only background process, we are suspecting whether it's causing
CPU to go high.

Since we are using SOLR as real time indexing of data and depending on its
result immd. to show it in UI as well. So we keep adding document around 100
to 200 documents in parallel in a sec. Also it would be in batch of 20 solr
documents list in one add...

*Note*: following is the code snippet we use for indexing / adding solr
document in batch per collection

/for (SolrCollectionList solrCollection : SolrCollectionList.values()) {
CollectionBucket collectionBucket =
getCollectionBucket(solrCollection);
List<SolrInputDocument> solrInputDocuments =
collectionBucket.getSolrInputDocumentList();
String collectionName = collectionBucket.getCollectionName();
try {
if(solrInputDocuments.size() > 0) {
CloudSolrClient solrClient =
PlatformIndexManager.getInstance().getCloudSolrClient(collectionName);
solrClient.add(collectionName, solrInputDocuments);
}
}/

*where solrClient is created as below
*
/this.cloudSolrClient = new
CloudSolrClient.Builder().withZkHost(zooKeeperHost).withHttpClient(HttpClien
tUtil.HttpClientFactory.createHttpClient()).build();
this.cloudSolrClient.setZkClientTimeout(30000);
/

Hard commit is kept as automatic and set to 15000 ms.
In this process, we also see, when merge is happening, and already
maxMergeCount ( default one ) is reached, commits are getting delayed and
solrj client ( where we add document ) is getting blocked and once once of
Merge thread process the merge, then solrj client returns the result.
How do we avoid this blocking of solrj client ? Do I need to go out of
default config for this scenario? I mean change the merge factor
configuration ?

Can you suggest what would be merge config for such a scenario ? Based on
forums, I tried to change the merge settings to the following,

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">30</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">30</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">512</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>

But couldn't see any much change in the behaviour.

In same solr node, we have multiple index / collection. In that case,
whether TieredMergePolicyFactory will be right option or for multiple
collection in same node we should go for other merge policy ( like LogByte
etc )

Can you throw some light on this aspects ?
Regards,

Regarding auto commit, we discussed lot with our product owners and atlast
> we are forced to keep it to 1sec and we couldn't increase further. As
> this itself, sometimes our customers says that they have to refresh
> their pages for couple of times to get the update from solr. So we
> can't increase further.

I understand pressure from nontechnical departments for very low response
times. Executives, sales, and marketing are usually the ones making those
kinds of demands. I think you should push back on that particular
requirement on technical grounds.

A soft commit interval that low *can* contribute to performance issues. It
doesn't always cause them, I'm just saying that it *can*. Maybe increasing
it to five or ten seconds could help performance, or maybe it will make no
real difference at all.

> Yes. As of now only solr is running in that machine. But intially we
> were running along with hbase region servers and was working fine. But
> due to CPU spikes and OS disk cache, we are forced to move solr to
> separate machine.
> But just I checked, our solr data folder size is coming only to 17GB.
> 2 collection has around 5GB and other are have 2 to 3 GB of size. If
> you say that only 2/3 of total size comes to OS disk cache, in top
> command VIRT property it's always 28G, which means more than what we
> have. Why is that...
> Pls check that top command & GC we used in this doc
> <https://docs.google.com/document/d/1SaKPbGAKEPP8bSbdvfX52gaLsYWnQf
> DqfmV802hWIiQ/edit?usp=sharing>

The VIRT memory should be about equivalent to the RES size plus the size of
all the index data on the system. So that looks about right. The actual
amount of memory allocated by Java for the heap and other memory structures
is approximately equal to RES minus SHR.

I am not sure whether the SHR size gets counted in VIRT. It probably does.
On some Linux systems, SHR grows to a very high number, but when that
happens, it typically doesn't reflect actual memory usage. I do not know
why this sometimes happens.That is a question for Oracle, since they are the
current owners of Java.

Only 5GB is in the buff/cache area. The system has 13GB of free memory.
That system is NOT low on memory.

With 4 CPUs, a load average in the 3-4 range is an indication that the
server is busy. I can't say for sure whether it means the server is
overloaded. Sometimes the load average on a system that's working well can
go higher than the CPU count, sometimes a load average well below the CPU
count is shown on a system with major performance issues. It's difficult to
say. The instantaneous CPU usage on the Solr process in that screenshot is
384 percent. Which means that it is exercising the CPUs hard. But this
might be perfectly OK. 96.3 percent of the CPU is being used by user
processes, a VERY small amount is being used by system, and the iowait
percentage is zero. Typically servers that are struggling will have a
higher percentage in system and/or iowait, and I don't see that here.

> Queries are quiet fast, most of time simple queries with fq. Regarding
> index, during peak hours, we index around 100 documents in a second in
> a average.

That's good. And not surprising, given how little memory pressure and how
much free memory there is. An indexing rate of 100 per second doesn't seem
like a lot of indexing to me, but for some indexes, it might be very heavy.
If your general performance is good, I wouldn't be too concerned about it.

> Regarding release, initially we tried with 6.4.1 and since many
> discussions over here, mentioned like moving to 6.5.x will solve lot
> of performance issues etc, so we moved to 6.5.1. We will move to 6.6.3
> in near future.

The 6.4.1 version had a really bad bug in it that killed performance for
most users. Some might not have even noticed a problem, though. It's
difficult to say for sure whether it would be something you would notice, or
whether you would see an increase in performance by upgrading.

> Hope I have given enough information. One strange thing is that, CPU
> and memory spike are not seen when we move to r4.xlarge to r4.2xlarge
> ( which is
> 8 core with 60 GB RAM ). But this would not be cost effective. What's
> making CPU and memory to go high in this new version ( due to doc
> values )? If I switch off docvalues will CPU & Memory spikes will get
> reduced ?

Overall memory usage (outside of the Java heap) looks great to me. CPU
usage is high, but I can't tell if it's TOO high. As a proof of concept, I
think you should try raising autoSoftCommit to five seconds. If maxDocs is
configured on either autoCommit or autoSoftCommit, remove it so that only
maxTime is there, regardless of whether you actually change maxTime. If
raising autoSoftCommit makes no real difference, then the 1 second
autoSoftCommit probably isn't a worry. I bet if you raised it to five
seconds, most users would never notice anything different.

If you want to provide a GC log to us that covers a relatively long
timeframe, we can analyze that and let you know whether your heap is sized
appropriately, or whether it might be too big or too small, and whether
garbage collection pauses are keeping your CPU usage high. The standard
Solr startup in most current versions always logs GC activity. It will
usually be in the same directory as solr.log.

Do you know what typical and peak queries per second are on your Solr
servers? If your query rate is high, handling that will probably require
more servers and a higher replica count.

Thanks,
Shawn

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

-----Original Message-----
From: Alessandro Benedetti [mailto:a.benedetti@sease.io]
Sent: 09 April 2018 15:43
To: solr-user@lucene.apache.org
Subject: Re: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

Hi Sami,
I agree with Mikhail, if you have relatively complex data you could curate
your own knowledge base for products as use it for Named entity Recognition.
You can then search a field compatible_with the extracted entity.

If the scenario is simpler using the analysis chain you mentioned should
work (if the product names are always complete and well curated).

Cheers

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director www.sease.io

On Mon, Apr 9, 2018 at 10:40 AM, Adhyan Arizki <a.arizki@gmail.com> wrote:

> You can just use synonyms for that.. rather hackish but it works
>
> On Mon, 9 Apr 2018, 05:06 Sami al Subhi, <sami@alsubhi.me> wrote:
>
> > I think this filter will output the desired result:
> >
> > <analyzer type="query">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.ShingleFilterFactory"/>
> > </analyzer>
> > <analyzer type="index">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.FingerprintFilterFactory" separator=" " />
> > </analyzer>
> >
> > indexing:
> > "iPhone 6" will be indexed as "iphone 6" (always a single token)
> >
> > querying:
> > so this will analyze "Apple iPhone 6 32GB white" to "apple", "apple
> > iphone", "iphone", "iphone 6" and so on...
> > then here a match will be achieved using the 4th token.
> >
> >
> > I dont see how this will result in false positive matching.
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>

FW: Solr join With must clause in fq

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org]
Sent: 09 April 2018 15:49
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr join With must clause in fq

it might make sense to test on the recent versions of Solr.

On Sun, Apr 8, 2018 at 8:21 PM, manuj singh <s.manuj545@gmail.com> wrote:

> Hi all,
> I am trying to debug a problem which i am facing and need some help.
>
> I have a solr query which does join on 2 different cores. so lets say
> my first core has following 3 docs
>
> { "id":"1", "m_id":"lebron", "some_info":"29" }
>
> { "id":"2", "m_id":"Wade", "matches_win":"29" }
>
> { "id":"3", "m_id":"lebron", "some_info":"1234" }
>
> my second core has the following docs
>
> { "m_id": "lebron", "team": "miami" }
>
> { "m_id": "Wade", "team": "miami" }
>
> so now we made an update to doc with lebron and changed the team to
> "clevelend". So the new docs in core 2 looks like this.
>
> { "m_id": "lebron", "team": "clevelend" }
>
> { "m_id": "Wade", "team": "miami" }
>
> now i am trying to join these 2 and finding the docs form core1 for
> team miami.
>
> my query looks like this
>
> fq=+{!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> I am expecting it to return doc with id=2 but what i am getting is
> document
> 1 and 2.
>
> I am not able to figure out what is the problem. Is the query incorrect ?
> or is there some issue in join.
>
> *Couple of observations.*
>
> 1.if i remove the + from the filter query it works as expected. so the
> following query works
>
> fq={!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> I am not sure how the Must clause affecting the query.
>
> *2.* Also if you look the original query is not returning document
> 3.(however its returning document 1 which has the same m_id). Now the
> only difference between doc 1 and doc3 is that doc1 was created when
"lebron"
> was part of team: miami. and doc3 was created when the team got
> updated to "cleveland". So the join is working fine for the new docs
> in core1 but not for the old docs.
>
> 3.If i use q instead of fq the query returns results as expected.
>
> q=+{!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> and
>
> q={!join from=m_id to=m_id fromIndex=core2 force=true}team:miami
>
> Both of the above works.
>
> I am sure i am missing something how internally join works. I am
> trying to understand why fq has a different behavior then q with the
Must(+) clause.
>
> I am using solr 4.10.
>
>
>
> Thanks
>
> Manuj
>

--
Sincerely yours
Mikhail Khludnev

Monday, April 2, 2018

FW: custom filter class on schema.xml on solrcloud

-----Original Message-----
From: void [mailto:sauravsust71@gmail.com]
Sent: 02 April 2018 14:32
To: solr-user@lucene.apache.org
Subject: custom filter class on schema.xml on solrcloud

I have used a custom filter provided by a jar in schema.xml in standalone
Solr like below

<filter class="com.x.yFilterFactory"
stopWordDictionary="resources/yStopWords"/>

And for this,

I have loaded the jar in solrconfig.xml like below

<lib dir="./../plugins/" regex=".*\.jar" />

It's working fine But when I've tried to use it in solrcloud with external
zookeeper mode I've got an error 'IO exception' maybe for uploading a large
jar file in zookeeper.

I've also tried to put this jar in the lib folder of solr home but got error
'Plugin init failure'

After that, I've tried blob store api but the documentation says "Blob store
can only be used to dynamically load components configured in
solrconfig.xml. Components specified in schema.xml cannot be loaded from
blob store"

So, how can I use custom filter class in schema.xml in solrcloud mode with
external zookeeper configuration

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: Learning to Rank (LTR) with grouping

-----Original Message-----
From: ilayaraja [mailto:ilay.msp@gmail.com]
Sent: 02 April 2018 12:27
To: solr-user@lucene.apache.org
Subject: Re: Learning to Rank (LTR) with grouping

Hi Roopa & Deigo,

I am facing same issue with grouping. Currently, am on Solr 7.2.1 but still
see that grouping with LTR is not working. Did you apply it as patch or the
latest solr version has the fix already?

Ilay

-----
--Ilay
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Sunday, April 1, 2018

FW: Upgrading a Plugin from 6.6 to 7.x

-----Original Message-----
From: Peter Alexander Kopciak [mailto:peter@kopciak.at]
Sent: 21 March 2018 16:17
To: solr-user@lucene.apache.org
Subject: Upgrading a Plugin from 6.6 to 7.x

Hi!

I'm still pretty new to Solr and I want to use the vector Scoring plugin (
https://github.com/saaay71/solr-vector-scoring/network) but unfortunately,
it does not seem to work for newer Solr versions.

I tested it with 6.6 to verify its functionality, so it seems to be broken
because of the upgrade to 7.x.

When following the installation procedure and executing the examples, I ran
into the following error with Query 1:

java.lang.UnsupportedOperationException: Query {! type=vp f=vector
vector=0.1,4.75,0.3,1.2,0.7,4.0 v=} does not implement createWeight

Does anyone has a lead for me how to fix/upgrade the plugin? The
createWeight method seems to exist, so I'm not sure where to start and waht
the problem seems to be.

FW: Get terms in solr not working

-----Original Message-----
From: adam rag [mailto:adamrag16@gmail.com]
Sent: 21 March 2018 11:10
To: solr-user@lucene.apache.org
Subject: Get terms in solr not working

To get top words in my Apache Solr instance, I am using "terms" query. When
I try it to get 10 terms in 100 million of data, the data are fetching after
a few minutes, But if the data is 300 million the Solr is not responding. My
server memory is 100 GB.

FW: solrj question

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: 26 March 2018 23:06
To: solr-user@lucene.apache.org
Subject: Re: solrj question

On 3/26/2018 11:19 AM, Webster Homer wrote:
> You may say that the String in the constructor is "meant to be query
> syntax", nothing in the Javadoc says anything about the expected syntax.
> Since there is also a method to set the query, it seemed reasonable to
> expect that it would take the output of the toString method. (or some
> other serialization method)

You're right that the javadoc is not very specific. It says this:

Parameters:
q - query string

In general in Solr, "query string" is understood to be something you would
put in the "q" parameter when you send a query. Or maybe the "fq"
parameter. The javadoc could definitely be improved.

The javadoc for the toString specifically used here is a little more
specific. (SolrQuery inherits from SolrParams, and that's where the
toString method is defined):

https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/common/param
s/SolrParams.html#toString--

It says "so that the URL may be unambiguously pasted back into a browser."

> So how would a user play back logged queries? This seems like an
> important use case. I can parse the toString output, It seems like the
> constructor should be able to take it.
> If not a constructor and toString, methods, I don't see methods to
> serialize and deserialize the query Being able to write the complete
> query to a log is important, but we also want to be able to read the
> log and submit the query to solr. Being able to playback the logs
> allows us to trouble shoot search issues on our site. It also
> provides a way to create load tests.
>
> Yes I can and am going to create this functionality, it's not that
> complicated, but I don't think it's unreasonable to think that the
> existing API should handle it.

Yes, that would be great capability to have. But it hasn't been written
yet. A method like "parseUrlString" on SolrQuery would be a good thing to
have.

Thanks,
Shawn

FW: Default Index config

-----Original Message-----
From: mganeshs [mailto:mganeshs@live.in]
Sent: 26 March 2018 22:15
To: solr-user@lucene.apache.org
Subject: Default Index config

Hi,

I haven't changed the solr config wrt index config, which means it's all
commented in the solrconfig.xml.

It's something like what I pasted before. But I would like to know whats the
default value of each of this.

Coz.. after loading to 6.5.1 and our document size also crossed 5GB in each
of our collection. Now update of document is taking time. So would like to
know whether we need to change any default configurations.

<indexConfig>

<lockType>${solr.lock.type:native}</lockType>

</indexConfig>

Advice...

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

FW: querying vs. highlighting: complete freedom?

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 26 March 2018 22:05
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: querying vs. highlighting: complete freedom?

Arturas:

Thanks for the "atta boy's", but I have to confess I poked a developer's
list and the person (David Smiley) who, you know, like understands the
highlighting code replied, and I passed it on ;

I have great respect for the SO forum, but don't post to it since there's
only so much time in a day, so please feel free to put that explanation over
there.

As for the rest, I'll have to pass today, the aforementioned time
constraints are calling....

Best,
Erick

On Mon, Mar 26, 2018 at 12:12 AM, Arturas Mazeika <mazeika@gmail.com> wrote:
> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what
> I mean when I supply a keyword without the field-qualifier. Very
impressive.
> Would you care (re)posting this answer to stackoverflow? If that is
> too much of a hassle, I'll do this in a couple of days myself on your
behalf.
>
> I am impressed how well, thorough, fast and fully the question was
answered.
>
> Steven hint pushed me into this direction further: he suggested to use
> the query part of solr to filter and sort out the relevant answers in
> the 1st step and in the 2nd step he'd highlight all the keywords using
> CTR+F (in the browser or some alternative viewer). This brought be to
> the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to
> achieve this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the
> query 2. Use the http://localhost:8983/solr/#/trans/analysis to
> re-analyze the document and the query 3. Use the matching of the
> substrings from the original text to last filter/tokenizer/analyzer in
> the analyze-chain to map the terms of the query 4. Emulate CTRL+F
> highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal.
> If one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955)
> was a German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside
> quantum mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=x
> ml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14
> %20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-bo
> rn%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%
> 20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics
> %20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theo
> ry&analysis.fieldtype=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core
> that receives the request is not necessarily the one that processes
> it? Or this already is distributed in a sense that receiving core and
> processing cores are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store
> this information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson
> <erickerickson@gmail.com>
> wrote:
>
>> Arturas:
>>
>> Try to field-qualify your hl.q parameter. That looks like:
>>
>> hl.q=trans:Kundigung
>> or
>> hl.q=trans:Kündigung
>>
>> I saw the exact behavior you describe when I did _not_ specify the
>> field in the hl.q parameter, i.e.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> didn't show all highlights.
>>
>> But when I did specify the field, it worked.
>>
>> Here's what I think is happening: Solr uses the default search field
>> when parsing an un-field-qualified query. I.e.
>>
>> q=something
>>
>> is parsed as
>>
>> q=default_search_field:something.
>>
>> The default field is controlled in solrconfig.xml with the "df"
>> parameter, you'll see entries like:
>> <str name="df">my_field</str>
>>
>> Also when I changed the "df" parameter to the field I was
>> highlighting on, I didn't need to specify the field on the hl.q
parameter.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> The default field is usually "text", which knows nothing about the
>> German-specific filters you've applied unless you changed it.
>>
>> So in the absence of a field-qualification for the hl.q parameter
>> Solr was parsing the query according to the analysis chain specifed
>> in your default field, and probably passed ü through without
>> transforming it. Since your indexing analysis chain for that field
>> folded ü to just plain u, it wasn't found or highlighted.
>>
>> On the surface, this does seem like something that should be changed,
>> I'll go ahead and ping the dev list.
>>
>> NOTE: I was trying this on Solr 7.1
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <mazeika@gmail.com>
>> wrote:
>> > Hi Erick,
>> >
>> > Thanks for the update and the infos. Your post brought quite a bit
>> > of
>> light
>> > into the picture and now I understand quite a bit more about what
>> > you are saying. Your explanation makes sense and can be quite
>> > useful in certain scenarious.
>> >
>> > What stroke me from your description is that you are saying that
>> > the analyzer-chain needs to be applied for the highlighting queries as
well.
>> > The tragedy is that I am not able to get this for a german
>> > collection: if the query is set (no explicit highlighting query),
>> > the highlighting is correct. It is also correct, if I replace the
>> > umaults into the corresponding latin chars. Getting the analyzer
>> > chain for the
>> highlighting
>> > terms remains the challenge.
>> >
>> > Do you think you have a look at the following stakoverflow link?
>> > Maybe something comes to your mind...
>> >
>> > *https://stackoverflow.com/questions/49276093/solr-
>> highlighting-terms-with-umlaut-not-found-not-highlighted
>> > <https://stackoverflow.com/questions/49276093/solr-
>> highlighting-terms-with-umlaut-not-found-not-highlighted>*
>> >
>> > *Cheers,*
>> >
>> > *Arturas*
>> > On Fri, Mar 23, 2018, 17:43 Erick Erickson
>> > <erickerickson@gmail.com>
>> wrote:
>> >
>> >> bq: this is not a typical case that one searches for a keyword but
>> >> highlights something else
>> >>
>> >> This isn't really an unusual case, apparently I mislead you.
>> >>
>> >> What I was trying to convey is that the analysis chain used is
>> >> firmly attached to a particular _field_. There's no way to say
>> >> "use one analysis chain for the query and another for highlighting
>> >> on the _same_ field".
>> >>
>> >> You can use two different fields with different analysis chains,
>> >> one for each purpose. So something like
>> >>
>> >> q=f1:something&hl.fl=f2,f3&hl.q=other
>> >>
>> >> is certainly reasonable. It'll search for "something" in f1, and
>> >> highlight "other" in f2 and f3
>> >>
>> >> Each fields processes its input with the analysis chain defined in
>> >> the schema.
>> >>
>> >> The rest about stored="true" can be ignored, it's just me
>> >> wandering off into the weeds about an optimization that only
>> >> stores the data once rather than redundantly in multiple fields.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika
>> >> <mazeika@gmail.com>
>> >> wrote:
>> >> > Hi Mathesis (Stefan),
>> >> >
>> >> > Thanks for the questions. This made me look at the problem from
>> >> > a
>> >> distance
>> >> > and re-frame the situation. Good questions indeed.
>> >> >
>> >> > Trying to go around: consider a user who describes herself as
>> >> > being a
>> BMW
>> >> > fan, being convinced that all BMW need to be the blackest color
>> possible
>> >> > (for a sake of argument) who would like to search and later
>> >> > browse the entries in the discussion forum (of course not
>> >> > everything but BMW of
>> the
>> >> > blackest color), and what interest her are the snippets that
>> >> > have understood, craziest as keywords or the like (because she
>> >> > is looking
>> for
>> >> a
>> >> > dozen of discussions that she saw before).
>> >> >
>> >> > What I was not able to achieve so far is: (i) combine query term
>> >> > for filtering and highlighting, (ii) using the analyzer-chain
>> >> > from the attribute to rewrite the highlight query (or define one
>> >> > in the search)
>> >> >
>> >> > CTR+F technique is a very powerful one, indeed. Works most of
>> >> > CTR+the
>> time.
>> >> The
>> >> > difficulties with it are query rewriting, enriching, etc.
>> >> >
>> >> > Cheers,
>> >> > Arturas
>> >> >
>> >> > On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <
>> >> matheis.stefan@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Perhaps we try it the other way round .. what's your use case
>> >> >> for
>> this?
>> >> I'm
>> >> >> trying to think of a situation where I'd need this a as user?
>> >> >>
>> >> >> The only reason I see myself doing this is CTRL+F in a page
>> >> >> when the
>> >> search
>> >> >> result is not immediately visible for me ;)
>> >> >>
>> >> >> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <mazeika@gmail.com>
>> wrote:
>> >> >>
>> >> >> > Hi Erick et al,
>> >> >> >
>> >> >> > From your answer I understand that this is not a typical case
>> >> >> > that
>> one
>> >> >> > searches for a keyword but highlights something else. Since
>> >> >> > we have
>> >> two
>> >> >> > parameters (q vs hl.q) I thought they are freely combinable.
>> >> >> > From
>> your
>> >> >> > answer I understand that this is not really the case. My
>> >> >> > current understanding came from [1] that says:
>> >> >> >
>> >> >> > hl.q
>> >> >> >
>> >> >> > A query to use for highlighting. This parameter allows you to
>> >> highlight
>> >> >> > different terms than those being used to retrieve documents.
>> >> >> > what I hear from you is something different: i.e., that this
>> >> >> > is not
>> >> >> enough
>> >> >> > just to combine the q with hl.q, that there are caveats to
>> >> >> > achieve
>> the
>> >> >> task
>> >> >> > (multiple fields, FastVectorHighlighter).
>> >> >> >
>> >> >> > Your infos are very helpful.
>> >> >> >
>> >> >> > Cheers,
>> >> >> > Arturas
>> >> >> >
>> >> >> > [1]
>> >> >> > https://lucene.apache.org/solr/guide/7_2/highlighting.html
>> >> >> >
>> >> >> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <
>> >> erickerickson@gmail.com
>> >> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Basically you need to use a copyField, but in several variants:
>> >> >> > >
>> >> >> > > If you use the field _exclusively_ for highlighting then
>> >> >> > > store
>> the
>> >> raw
>> >> >> > > content there and have the field use whatever analyzer you
want.
>> You
>> >> >> > > do _not_ need to have indexed="true" set for the field if
>> >> >> > > you're highlighting on the fly. So you're searching against
>> >> >> > > field1
>> (which
>> >> has
>> >> >> > > indexed="true" stored="false" set) but highlighting against
>> field2
>> >> >> > > (which has indexed="false" stored="true" set). Of course
>> >> >> > > any time
>> >> you
>> >> >> > > want to return the contents in a doc your fl needs to
>> >> >> > > specify field2...
>> >> >> > >
>> >> >> > > The above does not bloat your index at all since the cost
>> >> >> > > of stored="true" indexed="true" is the same as if you use
>> >> >> > > two
>> fields,
>> >> >> > > each with only one option turned on.
>> >> >> > >
>> >> >> > > The second approach if you want to use
>> >> >> > > FastVectorHighlighter or
>> the
>> >> >> > > like is simply to index both fields.
>> >> >> > >
>> >> >> > > Best,
>> >> >> > > Erick
>> >> >> > >
>> >> >> > > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <
>> mazeika@gmail.com
>> >> >
>> >> >> > > wrote:
>> >> >> > > > Hi Solr-Users,
>> >> >> > > >
>> >> >> > > > I've been playing with a german collection of documents,
>> >> >> > > > where
>> I
>> >> >> tried
>> >> >> > to
>> >> >> > > > search for one word (q=Tag) and highlighted another:
>> >> >> (hl.q=Kundigung).
>> >> >> > Is
>> >> >> > > > this a "legal" use case? My key question is how can I
>> >> >> > > > tell solr
>> >> which
>> >> >> > > query
>> >> >> > > > analyzer to use for highlighting? Strictly speaking, I
>> >> >> > > > should
>> use
>> >> >> > > > hl.q=Kündigung to conceptually look for relevant
>> >> >> > > > information,
>> but
>> >> in
>> >> >> > this
>> >> >> > > > case, no highlighting is returned (as all umlauts are
>> >> >> > > > left out
>> in
>> >> the
>> >> >> > > > index) .
>> >> >> > > >
>> >> >> > > > Additional infos:
>> >> >> > > >
>> >> >> > > > solr version: 7.2
>> >> >> > > > urls to query:
>> >> >> > > >
>> >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>> >> >> > > >
>> >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=K%C3%BCndigung&hl.snippets=3&wt=xml&r
>> >> >> > > ows=1
>> >> >> > > > <http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > Managed-schema:
>> >> >> > > >
>> >> >> > > > <fieldType name="text_de" class="solr.TextField"
>> >> >> > > positionIncrementGap="100">
>> >> >> > > > <analyzer>
>> >> >> > > > <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> >> > > > <filter class="solr.LowerCaseFilterFactory"/>
>> >> >> > > > <filter class="solr.StopFilterFactory"
format="snowball"
>> >> >> > > > words="lang/stopwords_de.txt" ignoreCase="true"/>
>> >> >> > > > <filter class="solr.GermanNormalizationFilterFactory"/>
>> >> >> > > > <filter class="solr.GermanLightStemFilterFactory"/>
>> >> >> > > > </analyzer>
>> >> >> > > > </fieldType>
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > Other additional infos:
>> >> >> > > > https://stackoverflow.com/questions/49276093/solr-
>> >> >> > > highlighting-terms-with-umlaut-not-found-not-highlighted
>> >> >> > > >
>> >> >> > > > Cheers,
>> >> >> > > > Arturas
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

FW: Boosting Fields Based On The Query Provided

-----Original Message-----
From: Mukhopadhyay, Aratrika [mailto:Aratrika.Mukhopadhyay@mail.house.gov]
Sent: 22 March 2018 18:48
To: solr-user@lucene.apache.org
Subject: RE: Boosting Fields Based On The Query Provided

Thanks for your reply Shawn. The query elevation worked for us. I have
another question though. Right now I have ways to handle specific queries in
the elevate.xml. The concern I am having is that I may have hundreds of
queries that need to return different pages first. Is the only way to do
this via the elevate.xml or is there a better approach for instance boosting
fields ? When I am boosting fields in this fashion it is not working for me
:

<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf"> url^50 host^30 content^20 title^10</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>

Thanks for your help .

Aratrika Mukhopadhyay
-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: Tuesday, March 20, 2018 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting Fields Based On The Query Provided

On 3/20/2018 2:25 PM, Mukhopadhyay, Aratrika wrote:
> I have a solr query which I am having a hard time configuring as I
would want it configured. Suppose I have a situation where I have two fields
field1(host field) and field2 (url field). I want a specific host to be
bubbled to the top for all terms except for when I am searching for specific
people in which case I want the URL to their landing page returned first. I
have configured the dismax query parser in my solrconfig but it seems that
the boost being applied is arbitrary .

<snip>

> <requestHandler name="/select" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="q">*:*</str>
> <str
name="bq">host:(www.starwars.com)^10</str<https://urldefense.proofpoint.com/
v2/url?u=http-3A__www.starwars.com-29-255e10-253c_str&d=DwID-g&c=L93KkjKsAC9
8uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237
NoEoCTMyiD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=QkEn
Izj19X_nqC298QkAUDbjv_zmP1Xr9Vn_z6BQXoM&e= >>
> <str name="q">Carrie Fisher</str>
> <str name="bq">url:(
http\:\/\/www.imdb.com\/name\/nm0000402/<https://urldefense.proofpoint.com/v
2/url?u=http-3A__www.imdb.com_name_nm0000402_&d=DwID-g&c=L93KkjKsAC98uTvC4Kv
QDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMy
iD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=ka0--rJJLml1
zFZu_P1xPisIiLpXR5LwsIMm82TuoUk&e= >)^8</str>
> <str name="q">Mark Hamill</str>
> <str name="bq">url:(
http\:\/\/www.imdb.com\/name\/nm0000434/<https://urldefense.proofpoint.com/v
2/url?u=http-3A__www.imdb.com_name_nm0000434_&d=DwID-g&c=L93KkjKsAC98uTvC4Kv
QDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237NoEoCTMy
iD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=sVRAvHN3kkJV
jN8XTAWjZS85tUmOXp9W4exWvMPpGUk&e= >)^8</str>
> </lst>
> </requestHandler>

I think there's a fundamental misunderstanding of how "defaults" works.

I have no idea what happens with multiple "q" parameters, which you have
configured in defaults. I do know that if your request includes a "q"
parameter, then what you've put in defaults for "q" is going to be
overridden and ignored.

This section of the documentation covers defaults, appends, and invariants:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_
guide_6-5F6_requesthandlers-2Dand-2Dsearchcomponents-2Din-2Dsolrconfig.html-
23RequestHandlersandSearchComponentsinSolrConfig-2DSearchHandlers&d=DwID-g&c
=L93KkjKsAC98uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWE
y5WPCDMJD237NoEoCTMyiD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegn
w_nb0&s=LcNEhj3Y-S5KMW2HP0CG9t9UpRgEVsTcP7u8QgqW3tk&e=

I think the Query Elevation Component might be the kind of functionality
you're after. What you're trying to do with defaults is NOT going to work.

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_
guide_6-5F6_the-2Dquery-2Delevation-2Dcomponent.html&d=DwID-g&c=L93KkjKsAC98
uTvC4KvQDTmmq1mJ2vMPtzuTpFgX8gY&r=fbfOUDlf9NEzjz9RxL3c7eXnjEvWEy5WPCDMJD237N
oEoCTMyiD1VH-RfTq9OP14&m=1RjiyUG9se2vpXYg-oLAiacdECUE6khXtuvegnw_nb0&s=I6TEN
NcAZab0ZE_j0tZ8hm8_7nuNFqhBwoey4Mm1T0E&e=

Thanks,
Shawn

FW: Solr or Elasticsearch

-----Original Message-----
From: Steven White [mailto:swhite4141@gmail.com]
Sent: 22 March 2018 18:44
To: solr-user@lucene.apache.org
Subject: Solr or Elasticsearch

Hi everyone,

There are some good write ups on the internet comparing the two and the one
thing that keeps coming up about Elasticsearch being superior to Solr is
it's analytic capability. However, I cannot find what those analytic
capabilities are and why they cannot be done using Solr. Can someone help
me with this question?

Personally, I'm a Solr user and the thing that concerns me about
Elasticsearch is the fact that it is owned by a company that can any day
decide to stop making Elasticsearch avaialble under Apache license and even
completely close free access to it.

So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done
using Solr? I want to see a complete list if possible.
2) Should an Elasticsearch user be worried that Elasticsearch may close it's
open-source policy at anytime or that outsiders have no say about it's road
map?

Thanks,

Steve

FW: querying vs. highlighting: complete freedom?

-----Original Message-----
From: Arturas Mazeika [mailto:mazeika@gmail.com]
Sent: 22 March 2018 14:48
To: solr-user@lucene.apache.org
Subject: querying vs. highlighting: complete freedom?

Hi Solr-Users,

I've been playing with a german collection of documents, where I tried to
search for one word (q=Tag) and highlighted another: (hl.q=Kundigung). Is
this a "legal" use case? My key question is how can I tell solr which query
analyzer to use for highlighting? Strictly speaking, I should use
hl.q=Kündigung to conceptually look for relevant information, but in this
case, no highlighting is returned (as all umlauts are left out in the
index) .

Additional infos:

solr version: 7.2
urls to query:

http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.
q=Kundigung&hl.snippets=3&wt=xml&rows=1

http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl.
q=K%C3%BCndigung&hl.snippets=3&wt=xml&rows=1
<http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=true&hl.fl=trans&hl
.q=Kundigung&hl.snippets=3&wt=xml&rows=1>

Managed-schema:

<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" format="snowball"
words="lang/stopwords_de.txt" ignoreCase="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.GermanLightStemFilterFactory"/>
</analyzer>
</fieldType>

Other additional infos:
https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-um
laut-not-found-not-highlighted

Cheers,
Arturas

FW: Get terms in solr not working

-----Original Message-----
From: Joel Bernstein [mailto:joelsolr@gmail.com]
Sent: 21 March 2018 20:51
To: solr-user@lucene.apache.org
Subject: Re: Get terms in solr not working

Also what is the use case? What do you plan to do with terms? There may be
other approaches that will work better then the terms query.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Mar 21, 2018 at 9:28 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> We need a lot more information. What is the exact query you're using?
> Is 100M the number of docs? How many terms are in the field?
>
> On Tue, Mar 20, 2018 at 10:39 PM, adam rag <adamrag16@gmail.com> wrote:
> > To get top words in my Apache Solr instance, I am using "terms" query.
> When
> > I try it to get 10 terms in 100 million of data, the data are
> > fetching after a few minutes, But if the data is 300 million the
> > Solr is not responding. My server memory is 100 GB.
>

FW: Upgrading a Plugin from 6.6 to 7.x

-----Original Message-----
From: Atita Arora [mailto:atitaarora@gmail.com]
Sent: 21 March 2018 19:01
To: solr-user@lucene.apache.org
Subject: Re: Upgrading a Plugin from 6.6 to 7.x

Hi Peter,

*(Sorry for the earlier incomplete email - I hit send by mistake)*

I haven't really been able to look into it completely , but my first glance
says , it should be because the method signature has changed.

Iam looking here : https://lucene.apache.org/core/7_0_0/core/org/apache/
lucene/search/Query.html

createWeight
<https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/Query.ht
ml#createWeight-org.apache.lucene.search.IndexSearcher-boolean-float->
(IndexSearcher
<https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/IndexSea
rcher.html>
searcher, boolean needsScores, float boost)
Expert: Constructs an appropriate Weight implementation for this query.

While at :

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/Query.htm
l

createWeight
<https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/Query.ht
ml#createWeight-org.apache.lucene.search.IndexSearcher-boolean->
(IndexSearcher
<https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/IndexSea
rcher.html>
searcher,
boolean needsScores)
Expert: Constructs an appropriate Weight implementation for this query.

You would need a code change for this to make it work in Version 7.

Thanks,
Atita

On Wed, Mar 21, 2018 at 6:59 PM, Atita Arora <atitaarora@gmail.com> wrote:

> Hi Peter,
>
> I haven't really been able to look into it completely , but my first
> glance says , it should be because the method signature has changed.
>
> Iam looking here :
> https://lucene.apache.org/core/7_0_0/core/org/apache/
> lucene/search/Query.html
>
> createWeight
> <https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/Qu
> ery.html#createWeight-org.apache.lucene.search.IndexSearcher-boolean-f
> loat->
> (IndexSearcher
> <https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/In
> dexSearcher.html> searcher, boolean needsScores, float boost)
> Expert: Constructs an appropriate Weight implementation for this query.
>
> While at :
>
>
> On Wed, Mar 21, 2018 at 4:16 PM, Peter Alexander Kopciak
> <peter@kopciak.at
> > wrote:
>
>> Hi!
>>
>> I'm still pretty new to Solr and I want to use the vector Scoring
>> plugin (
>> https://github.com/saaay71/solr-vector-scoring/network) but
>> unfortunately, it does not seem to work for newer Solr versions.
>>
>> I tested it with 6.6 to verify its functionality, so it seems to be
>> broken because of the upgrade to 7.x.
>>
>> When following the installation procedure and executing the examples,
>> I ran into the following error with Query 1:
>>
>> java.lang.UnsupportedOperationException: Query {! type=vp f=vector
>> vector=0.1,4.75,0.3,1.2,0.7,4.0 v=} does not implement createWeight
>>
>> Does anyone has a lead for me how to fix/upgrade the plugin? The
>> createWeight method seems to exist, so I'm not sure where to start
>> and waht the problem seems to be.
>>
>
>

FW: Solr main replica down, another replica taking over

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: 21 March 2018 18:37
To: solr-user@lucene.apache.org
Subject: Re: Solr main replica down, another replica taking over

On 3/21/2018 12:04 AM, Midas A wrote:
> We want to send less traffic over virtual machines and more on
> physical servers . How can we achieve this

At the moment, I do not know of any functionality in SolrCloud to accomplish
this goal. As I mentioned before, there is work underway to make it
possible, but it's not available yet.

One thing you could do is include preferLocalShards=true as a URL parameter
and only send requests to the physical servers (unless they are down), but
to do that, you'll have to handle load balancing yourself.

Thanks,
Shawn

FW: [PHP Classes] Notable PHP package: PHP DNS Check Tool

From: PHP Classes Notable [mailto:list-notable@phpclasses.org]
Sent: 21 March 2018 12:22
To: ROSHAN <roshan@siddhast.com>
Subject: [PHP Classes] Notable PHP package: PHP DNS Check Tool

A DNS is a server hosted in the Internet that can return IP addresses of other computers also on the Internet. Often computers need to query different DNS servers to obtain the IP addresses of same computers, but since the information may not be synchronized, there may be differences between the record values. This package can determine if there are differences between the values of given records stored in different DNS servers.

Notable PHP package: PHP DNS Check Tool

Replay Real User Sessions

Monitor and Replay what Real Users do on your Website or Web app

ROSHAN, a PHP package is considered Notable when it does something different that is worth noting.

If you have also written Notable packages, contribute them to the PHP Classes site to get your work more exposure.

If your notable package is innovative, you may also earn prizes and recognition in the PHP Innovation Award.

Now you can also win a Big elePHPant as one of the possible prizes you can win every month. Check the complete list of prizes here: List of prizes

Package

PHP DNS Check Tool

Check DNS records and compare record sets

Moderator comment

A DNS is a server hosted in the Internet that can return IP addresses of other computers also on the Internet.

Often computers need to query different DNS servers to obtain the IP addresses of same computers, but since the information may not be synchronized, there may be differences between the record values.

This package can determine if there are differences between the values of given records stored in different DNS servers.

Author

Matous Nemec

Groups

Networking, PHP 7

Description

This class can check DNS records and compare record sets.

It can perform lookups to DNS servers to obtain the values of record for certain domains and of certain record types.

The class can also compare sets of records obtained from different providers like DNS servers or arrays to determine the differences and see what changed.

ROSHAN you are getting this message as free service for being a user of the PHP Classes site to which you registered voluntarily using the email address roshan@siddhast.com. If you wish to unsubscribe go to the unsubscribe page.

Sunday, April 15, 2018

Starting services in safe mode

Thursday, April 12, 2018

FW: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

Monday, April 9, 2018

FW: Default Index config

FW: Match a phrase like "Apple iPhone 6 32GB white" with "iphone 6"

FW: Solr join With must clause in fq

Monday, April 2, 2018

FW: custom filter class on schema.xml on solrcloud

FW: Learning to Rank (LTR) with grouping

Sunday, April 1, 2018

FW: Upgrading a Plugin from 6.6 to 7.x

FW: Get terms in solr not working

FW: solrj question

FW: Default Index config

FW: querying vs. highlighting: complete freedom?

FW: Boosting Fields Based On The Query Provided

FW: Solr or Elasticsearch

FW: querying vs. highlighting: complete freedom?

FW: Get terms in solr not working

FW: Upgrading a Plugin from 6.6 to 7.x

FW: Solr main replica down, another replica taking over

FW: [PHP Classes] Notable PHP package: PHP DNS Check Tool

ROSHAN, a PHP package is considered Notable when it does something different that is worth noting.

Package

Moderator comment

Author

Groups

Description

Blog Archive

About Me