I went to LuceneRevolution to test out my assertion that
Search is the dominant metaphor for working with Big Data
and based on the conversations that I had, that assertion holds water.
As Grant Ingersoll pointed out in his keynote, the basic plumbing required for Big Data: storage, distributed processing, cheap price tag, have been met. What we are missing is the actual ability to make decisions based on the information contained in our Big Data sets. We are still caught up in the navel-gazing activity of “how much raw data have I collected”, and aren’t focusing on “Should I make decision X or Y based on my data.” There is a huge gap between those who write MapReduce jobs, and those who need access to the results of those jobs. Processed results aren’t enough, and we shouldn’t need to file the equivalent of a FOIA request with our IT department to gain access to the raw data. Search-based applications, also known as Search, Discovery, and Analytics (SDA) fill the gap between the developers and data scientists working with the raw data and the business users attempting to make data-driven discussions.
Search engines were the original “Big Data” ten years ago. Then the rise of Google led to the search market bifurcating into efforts related to internal Enterprise search, and e-commerce search. The importance of Search seemed to dwindle, witness the declining attendance count at conferences like Enterprise Search Summit. But with the accelerating growth in data, aka “Big Data,” search in the last few years has moved from a basic input box to the feature that can make or break your application.
- Met a number of ex-Endeca folks. I’m hoping that the Lucene community takes advantage of these people who’ve done cool things with other search engines like Endeca, and bring some of their great ideas into Lucene and Solr. New blood is good.
- This continues to be the “Year of Big Data”. I’m looking forward to tighter integration between the search and the big data communities. Lots of folks are building custom QueryParsers to solve specific problems. Be interesting to see how much of this becomes generalized and open sourced.
- Microsoft seems to have dropped their knee-jerk reaction against Java, and is working to make it easy to run the Big Data ecosystem of projects on their cloud platform Azure.
- People are anxious to use Lucene 4. A strength of the Solr open source project is the incredible level of unit testing that is there. Go ahead and use it! If your IT manager doesn’t like to use unreleased code, tell’em to come <a href=”mailto:email@example.com”>talk to me</a>!
- ElasticSearch continues to have some great mind share, but suffers from the much smaller committer community of 1! The competition is keeping Solr honest.
- Mark Miller gave a big pitch for the RandomizedTesting which is an extraction of Solr/Lucene’s awesome unit testing framework into something generic. Anything that makes testing complex systems simpler is good.
It was a great conference, very thought provoking, great people and conversations. LuceneRevolution 2012 continues to set the bar for hard core technical conferences. Attendance is a no-brainer if you are working with Lucene!