Blog

Apache Sentry. So close, and yet nothing.

Security, its always been the bug a boo of Solr. There is a wide sense that security isnt a concern of the Solr community, and that isnt quite accurate. How to secure Solr is pretty simple. Its just that there isnt any one “blessed” approach that is wrapped into the codebase as each organizations needs are different.

Before I continue, I want to mention that when you say security, it actually encompasses three areas:

  • Securing documents and collections using roles.
  • Securing the server from a web perspective
  • Securing the data at rest and in transit.

Apache Sentry, currently undergoing incubation, was announced as a tool for securing Solr. I attended the nicely done session by Gregory Chanan at LuceneRevolution last week. What convinced me to go was the program description:

Sentry augments Solr with support for Kerberos authentication as well as collection and document-level access control. This session will cover the ACL models and features of Sentry’s security mechanisms, implementation details on Sentry’s integration with Solr, and performance measurements in order to characterize the impact of integrating Sentry with Solr.

I was really excited about Sentry! Finally, something that combines the early binding and late binding approaches for documents into a single packaged solution. I hoped that it would give a nice layer around the Solr admin interface to lock it down. Provide some features around hardening Solr, like maybe running a check to see if the enableRemoteStreaming option was enabled and showing it as a vulnerability in dashboard. And lastly, how about giving me some nice options to ensure that no jokers pass rows=10000000 to Solr! Or even, rows=100&start=1000000.

Instead, Sentry turns out to be just another stalking horse for selling Cloudera distributions. Ostensibly it provides Kerberos integration for Solr, but during the presentation it was highlighted that the hard work of enabling Kerberos (key tabs and the rest) remain your responsibility, unless of course you use CDH. The problems that it does solve, namely document and collection filtering capabilities are the easiest of the the three areas of security that I highlighted above.

I think part of my dis-illusionment was that when I heard Apache Sentry I thought, great, a best of breed general purpose security project! Great. But instead, its very specific to Hadoop. Maybe if it had been called Apache Elephant Enclosure I would have realized that it was specific to Hadoop, and integrating things into the Hadoop ecosystem.

If you are looking to lock down your Solr, here are some options: