Im looking forward to seeing everyone at ApacheCon in Denver next week! Ill be giving two talks this year. They both focus heavily on search relevancy, an area that weve been working hard to highlight. Weve found that folks plug in search, Solr or Elasticsearch, and get themselves to a point where search seems to work. That is until somebody starts taking a close look at the quality of the results. The results for user searches arent returning expected results. Our Duke Medicine project is an example of this – getting Solr installed can be easy but you have to work to get relevant search. (And of course, shameless plug, thats when you contact us to help!)
My first talk, Test Driven Relevancy, discusses the process of getting good search. We iterate on search the same way we iterate over software. We capture business requirements and develop tests to prove search is making progress. Thats why we built (another shameless plug) Quepid – to be our workbench on iterating on customer search requirements. Heres the synopsis:
Getting good search results is hard; maintaining good relevancy is even harder. Fixing one problem can easily create many others. Without good tools to measure the impact of relevancy changes, theres no way to know if the “fix” that youve developed will cause relevancy problems with other queries. Ideally, much like we have unit tests for code to detect when bugs are introduced, we would like to create ways to measure changes in relevancy. This is exactly what weve done at OpenSource Connections. Weve developed a tool, Quepid, that allows us to work with content experts to define metrics for search quality. Once defined, we can instantly measure the impact of modifying our relevancy strategy, allowing us to iterate quickly on very difficult relevancy problems. Get an in depth look at the tools we use to not only search a relevancy problem – but to make sure it stays solved!
My second talk, Hacking Lucene For Custom Search Results, discusses a bit more of the how – and at a very technical level! Recently while working for a client I had to pull out all the stops and completely rearchitect how Solr/Lucene was scoring to create a custom Lucene Query & Scorer. It was a daunting but fun task – and it shows how much control (and responsibility) one can take over how search works in the Lucene ecosystem. Heres the synopsis:
Search is everywhere, and therefore so is Apache Lucene. While providing amazing out-of-the-box defaults, theres enough projects weird enough to require custom search scoring and ranking. In this talk, Ill walk through how to use Lucene to implement your custom scoring and search ranking. Well see how you can achieve both amazing power (and responsibility) over your search results. Well see the flexibility of Lucenes data structures and explore the pros/cons of custom Lucene scoring vs other methods of improving search relevancy.
Please contact me if youd like to touch base at ApacheCon. Wed love to talk about your search relevancy problems. Were especially looking for freelancers and partners to help us with all this exciting work! Hope to see you there!