Sunday, April 1, 2018

FW: Solr on HDInsight to write to Active Data Lake

-----Original Message-----
From: Abhi Basu [mailto:9000revs@gmail.com]
Sent: 23 March 2018 20:42
To: solr-user@lucene.apache.org
Subject: Solr on HDInsight to write to Active Data Lake

MS Azure does not support Solr 4.9 on HDI, so I am posting here. I would
like to write index collection data to HDFS (hosted on ADL).

Note: I am able to get to ADL from hadoop fs command like, so hadoop is
configured correctly to get to ADL:
hadoop fs -ls adl://

This is what I have done so far:
1. Copied all required jars to sol ext lib folder:
sudo cp -f /usr/hdp/current/hadoop-client/*.jar
/usr/hdp/current/solr/example/lib/ext
sudo cp -f /usr/hdp/current/hadoop-client/lib/*.jar
/usr/hdp/current/solr/example/lib/ext
sudo cp -f /usr/hdp/current/hadoop-hdfs-client/*.jar
/usr/hdp/current/solr/example/lib/ext
sudo cp -f /usr/hdp/current/hadoop-hdfs-client/lib/*.jar
/usr/hdp/current/solr/example/lib/ext
sudo cp -f
/usr/hdp/current/storm-client/contrib/storm-hbase/storm-hbase*.jar
/usr/hdp/current/solr/example/lib/ext
sudo cp -f /usr/hdp/current/phoenix-client/lib/phoenix*.jar
/usr/hdp/current/solr/example/lib/ext
sudo cp -f /usr/hdp/current/hbase-client/lib/hbase*.jar
/usr/hdp/current/solr/example/lib/ext

This includes the Azure active data lake jars also.

2. Edited my solr-config.xml file for my collection:

<dataDir>${solr.core.name}/data/</dataDir>

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
<str
name="solr.hdfs.home">adl://esodevdleus2.azuredatalakestore.net/clusters/eso
hadoopdeveus2/solr/
</str>
<str name="solr.hdfs.confdir">/usr/hdp/2.6.2.25-1/hadoop/conf</str>
<str
name="solr.hdfs.blockcache.global">${solr.hdfs.blockcache.global:true}</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
</directoryFactory>


When this collection is deployed to solr, I see this error message:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2189</int></lst>
<lst name="failure">
<str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Er
ror
CREATEing SolrCore 'ems-collection_shard2_replica2':
Unable to create core: ems-collection_shard2_replica2 Caused by: Class
org.apache.hadoop.fs.adl.HdiAdlFileSystem not
found</str><str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrE
xception:Error
CREATEing SolrCore 'ems-collection_shard2_replica1': Unable to create
core: ems-collection_shard2_replica1 Caused by: Class
org.apache.hadoop.fs.adl.HdiAdlFileSystem not
found</str><str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrE
xception:Error
CREATEing SolrCore 'ems-collection_shard1_replica1': Unable to create
core: ems-collection_shard1_replica1 Caused by: Class
org.apache.hadoop.fs.adl.HdiAdlFileSystem not
found</str><str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrE
xception:Error
CREATEing SolrCore 'ems-collection_shard1_replica2': Unable to create
core: ems-collection_shard1_replica2 Caused by: Class
org.apache.hadoop.fs.adl.HdiAdlFileSystem not found</str> </lst> </response>


Has anyone done this and can help me out?

Thanks,

Abhi


--
Abhi Basu

No comments:

Post a Comment