Blog

Notes from using LucidWorks for Solr Distro

Ive been playing with the LucidWorks for Solr distribution of Solr 1.4, and wanted to share some of things I had noticed about it. The LucidWorks distro is Solr 1.4 with patches and enhancements from Lucid added in.

Installer

The first thing youll notice is that an installer (and uninstaller) is provided that walks you through the basic steps of installing Solr. Now Solr itself is pretty darn simple to work with already, but you do need to compile the code, which means you need Ant to be installed. The Lucid installer avoids that need, and  adds support for running Solr in Tomcat as well as Jetty. And, assuming you have a support agreement with Lucid, it supports downloading plugins from Lucid to extend your Solr platform. Right now the only free plugin is the Reference Guide PDF. Having an installer available definitely checks a box for the systems type folks who may be installing Solr, but it doesnt really do anything crazy special. Also, one nit is that if you install into /opt/dirA, and then want to install into /opt/dirB, you have to delete ~/.LucidWorks/ directory as the install dir is cached!  But it does demonstrate what might be coming from Lucid in future updates!

Installer Targets Screen

Installer Targets Screen

Another enhancment from Lucid is a Tray Application for managing your Solr instances. However, this turns out to just be a basic (on OSX at least!) menubar application that allows you to start/stop a local Solr server. There doesnt seem to be any options to stop and start remote servers, or monitor the health of running Solrs, so I think this is something you use once and never again! Hey Lucid, it would be great though if the Tray App integrated stoplight monitoring of Solr instances and popped open web pages to admin pages to perform various tasks on your collection of Solr servers!

Directory Layout

The directory that youve installed Solr into should look very familiar. In fact, too familiar to me! Ive gone back and forth on the way that Solr is distributed with source code as well as compiled jars. While Solr used to be a tool that only Java centric shops would look at, its now gone mainstream, to where many, if not most, organizations that use Solr are not traditional Java shops! I really wish I could download a version of Solr that didnt have the src directory, was just a stripped down ready to go application. Admittedly, the example application that is part of the source functions as a template, but it has been bemoaned by myself and others that folks just use and abuse the configuration of what was meant as an example app, to their detriment!

So I was hoping that the LucidWorks distros Installer would function as that smart template by walking me through including/excluding various extensions like DIH, Clustering, and Extraction. But at least in this first version, no such luck. The support though for for picking either Tomcat or Jetty as a container shows what could be in the offing though!

While the LucidWorks distro still ships with the hoary old example directory is still there, there is now a lucidworks directory. When you run the new toplevel start.sh shell script it starts Solr with solr.solr.home=lucidworks/solr directory. Something to note is that the start.sh has complete paths defined in it from the installer:

[sourcecode language=”text”]cd /Users/epugh/solr/solr2/LucidWorks/lucidworks/jetty/../[/sourcecode]

It really should at least have a single variable at the top that you can changing depending on what environment you are in.

The lucidworks project is also setup as a single index project.  Since the future is multicore configurations, Id like to see that as the default in more examples.  (The example app needs a bit of work as well to better show off multicore as a first class feature!)

solrconfig.xml

Doing a diff on the example and lucidworks versions of solrconfig.xml shows its pretty much the same as the one from the example app, but with the correct configurations for DataImportHandler and the Velocity based search UI called Solritas. Solritas is a nice tool for helping you “wedge” Solr into places by providing a simple Velocity template based translation layer, and even build a GUI, within your Solr environment. Solritas hasnt received a lot of buzz, so its nice seeing it turned on by default! The clustering functionality is also specified, but not sure if the solr.cluster.enabled=true startup parameter is actually required or not.

The other oddity is that the Lucid monitoring product for Solr, SolrGaze, isnt enabled by default! Doesnt seem like the most ringing endorsement for the software. Im excited by the prospect of better visiblity into the internals of Solr, so I enabled it.

schema.xml

Diffing the two schema.xml files reveals the addition of the Lucid KStemmer com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory for fast non-aggresive text stemming. According to Lucid it provides:

Large field performance shows a 220% performance increase, while small fields show a 1140% increase compared to the original UMASS code.

SolrGaze

SolrGaze promises to make it easier to see what is going on inside of Solr. Anything that makes it simpler for operations folks instead of developers to manage Solr is good in my book. I ran into one nit which was I opened up SolrGaze using the url http://localhost:8983/gaze/index.html. It barfed connecting to Solr to display gathered metrics, but if I used http://127.0.0.1:8983/gaze/index.html then everything was fine.

I havent had to chance to really play with Gaze yet, so Ill post a more in-depth review soon.

Summary

All in all, the Lucid distro would be what I would recommend for a first timer to download, or someone doing a spike of development and needing a quick install of Solr.  Not requiring Ant to be installed is a wonderful thing, and being pre-configured for Clustering, DIH, and Solritas means you get to see a working Solr install, complete with a full featured GUI, right out of the box.  In terms of using for a production deploy, there is less to recommend it since youre going to want to strip down to just the bits and bobs that your require for your specific needs.  I havent delved down into what SolrGaze provides, so that feature may be the tipping point for deciding to use the Lucid distribution.