BioSolr — Stop Reinventing The Wheel in Life Science Search

I was privileged to be hosted in Cambridge UK by our buddies at Flax two weeks ago. During my time over in the UK, I got a chance to see some of the interesting work they’re doing. I was also very grateful to get to speak to the London Lucene/Solr Meetup about Search Relevancy, Quepid, and my book Relevant Search.

One project that caught my eye in particular was BioSolr. BioSolr is being developed by Flax in conjunction with the European Bioinformatics Institute (EBI). We’ve done a great deal of work in life sciences with search, and we frequently find organizations solving the same sorts of problems over and over and over. For this reason, I was really excited to compare notes with Flax and EBI about common themes and challenges encountered in life science search. What Flax and EBI are hoping to do is implement a set of features that can be hopefully integrated back into the Solr mainline that help enable many of life science search use cases out there. And here in the US at OSC we’re looking to find ways to support their efforts!

One topic of particular interest to me is integrating external ranking signals seamlessly with normal relevancy ranking (Solr’s XJOIN). Organizations face this problem in many ways. For life sciences, this could be anything from specific software to rank chemical structures and proteins, to information that might require real-time image similarity. The point is it can be hard to integrate every important content feature as part of an search index. So if you can’t bend a search engine to do the ranking you want, why not rely on systems that might do this work more precisely to get specific signals, and integrate those with the search engines other ranking abilities?

Another place where XJOIN could be interesting is integrating recommendations and search. Everyone (inside and outside of life sciences) is trying to figure out how to integrate user signals with content-based relevance. You don’t need a big proprietary platform to do this! You can have one system that can handle recommendations–systems explicitly built to recommend content to users based on user behavior– and another system explicitly built for traditional search. The two can be used in tandem, with a search engine ultimately serving up the results with carefully tuned ranking that balances features of the content and signals about what users like.

Another topic that I hear over and over in life sciences search is managing ontologies and taxonomies. Life Sciences is full of taxonomies such as NIH’s MeSH that describe a large catalog of medical concepts. BioSolr is working to find useful ways of integrating Solr and taxonomies and ontologies. I had a long conversation with EBI and Flax’s team about some tricks for doing similarities between taxonomies. There’s a lot of fascinating work to do to find the best way of incorporating these aids into search. I’m glad Flax and EBI are working on getting this shared problem to share a common solution.

I’m eager to keep up with the BioSolr project and peck out features I think will be useful for our clients. But you should check it out! They need your support and momentum to continue to accumulate life science inspired search features for eventual inclusion in the Solr mainline. Right now the project is just a bunch of plugins – but with your help it can be the open source project where we work together these problems instead of siloed into our own organizations!

If BioSolr interests you, and you’d like Solr consultation to see if it can apply to your use case do feel free to contact us in the US or Flax in Europe! There’s great work to leverage here – all us smart search folks CAN work together to stop reinventing the life sciences search wheel!