I’m looking forward to seeing everyone at ApacheCon in Denver next week! I’ll be giving two talks this year. They both focus heavily on search relevancy, an area that we’ve been working hard to highlight. We’ve found that folks plug in search, Solr or Elasticsearch, and get themselves to a point where search seems to work. That is until somebody starts taking a close look at the quality of the results. The results for user searches aren’t returning expected results. Our Duke Medicine project is an example of this — getting Solr installed can be easy but you have to work to get relevant search. (And of course, shameless plug, that’s when you contact us to help!)
My first talk, Test Driven Relevancy, discusses the process of getting good search. We iterate on search the same way we iterate over software. We capture business requirements and develop tests to prove search is making progress. That’s why we built (another shameless plug) Quepid — to be our workbench on iterating on customer search requirements. Here’s the synopsis:
Getting good search results is hard; maintaining good relevancy is even harder. Fixing one problem can easily create many others. Without good tools to measure the impact of relevancy changes, there’s no way to know if the “fix” that you’ve developed will cause relevancy problems with other queries. Ideally, much like we have unit tests for code to detect when bugs are introduced, we would like to create ways to measure changes in relevancy. This is exactly what we’ve done at OpenSource Connections. We’ve developed a tool, Quepid, that allows us to work with content experts to define metrics for search quality. Once defined, we can instantly measure the impact of modifying our relevancy strategy, allowing us to iterate quickly on very difficult relevancy problems. Get an in depth look at the tools we use to not only search a relevancy problem — but to make sure it stays solved!
My second talk, Hacking Lucene For Custom Search Results, discusses a bit more of the how — and at a very technical level! Recently while working for a client I had to pull out all the stops and completely rearchitect how Solr/Lucene was scoring to create a custom Lucene Query & Scorer. It was a daunting but fun task — and it shows how much control (and responsibility) one can take over how search works in the Lucene ecosystem. Here’s the synopsis:
Search is everywhere, and therefore so is Apache Lucene. While providing amazing out-of-the-box defaults, there’s enough projects weird enough to require custom search scoring and ranking. In this talk, I’ll walk through how to use Lucene to implement your custom scoring and search ranking. We’ll see how you can achieve both amazing power (and responsibility) over your search results. We’ll see the flexibility of Lucene’s data structures and explore the pros/cons of custom Lucene scoring vs other methods of improving search relevancy.
Please contact me if you’d like to touch base at ApacheCon. We’d love to talk about your search relevancy problems. We’re especially looking for freelancers and partners to help us with all this exciting work! Hope to see you there!