Graph Phun in Solr

Scott StultsNovember 11, 2015

Recently Solr has acquired graph search capability similar to what you’d find in TitanDB or Neo4J. Essentially it acts as a collector that traverses link_from to link_to, while optionally applying a filter query each step of the way. You can also decide to return all connected hops or just the leaves.

SOLR-7543: Create GraphQuery that allows graph traversal as a query operator

Now, general purpose graph databases have a radically different storage model than Lucene’s Inverted Index, so performing complex or deep graph queries is not an option. The trade-off is that the traversal depth is restricted to 4. At that depth you’re still able to model something like Role-Based Access Control or query augmentation with hypernyms an hyponyms.

Hyponyms and Hypernyms

Hyponyms and Hypernyms

Role-based Access Control

Role-based Access Control

Since this is so new you’ll need to pull down the latest version of Solr and build it. If you prefer git, you can also pull down the source from the GitHub mirror.

To get a feel for how this works I decided to use WordNet1 to index some words along with their hypernyms with pysolr and nltk. The indexing code itself isn’t that interesting so I won’t go into its details here, but here’s the Gist.

The other thing I wanted to experiment with is Solr’s new schemaless capability. So once I built Solr I ran it with bin/solr -e schemaless to get a blank collection.

After the indexing code finished I opened up the Admin Console and checked out the schema. Sure enough, there were the fields, all indexed as string types. I also noticed that each was copied to the general _text_ field.

Okay, now let’s do some graph stuph! (last joke like that, I swear).

What are the hypernyms of “dog”?

Well, that’s easy because we indexed them directly as fields:

http://localhost:8983/solr/gettingstarted/select?wt=xml&indent=true&q=id:dog.n.01&fl=hypernyms

{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "hypernyms":["canine.n.02",
          "domestic_animal.n.01"]}]
  }}

By the way, the reason it’s “dog.n.01” and not just “dog” is because WordNet is very specific about terms, and this happens to be the second word sense of “dog” the noun (unlike a perjoritive term for how someone looks, or the verb for following someone closely).

What are the hypernym’s of the hypernyms of “dog”?

So what we want to do here is start at the id field, look at the matching hypernyms field, then for each value there try to find a doc with that id:

http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q={!graph%20from=%22id%22%20to=%22hypernyms%22%20returnRoot=%22false%22%20returnOnlyLeaf=%22false%22%20maxDepth=3}id:dog.n.01&fl=id&echoParams=none

(From now on I’ll omit the other parameters and just focus on the graph query.)

{
  "responseHeader":{
    "status":0,
    "QTime":4},
  "response":{"numFound":4,"start":0,"docs":[
      {
        "id":"animal.n.01"},
      {
        "id":"domestic_animal.n.01"},
      {
        "id":"carnivore.n.01"},
      {
        "id":"canine.n.02"}]
  }}

Notice we’re returning intermediate nodes of “caninie.n.02” and “domestic_animal.n.01”. In the last query these values were in the hypernyms field, now they’re in the id.

What are all the hyponyms of “canine”?

{!graph%20from=%22id%22%20to=%22hyponyms%22%20returnRoot=%22false%22%20returnOnlyLeaf=%22false%22%20maxDepth=2}id:canine.n.02

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"id:canine.n.02",
      "indent":"true",
      "fl":"hyponyms",
      "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "hyponyms":["bitch.n.04",
          "dog.n.01",
          "fox.n.01",
          "hyena.n.01",
          "jackal.n.01",
          "wild_dog.n.01",
          "wolf.n.01"]}]
  }}

How far up the chain of hypernyms of “canine” can I go?

{!graph%20from=%22id%22%20to=%22hypernyms%22%20returnRoot=%22false%22%20returnOnlyLeaf=%22false%22%20maxDepth=-1}id:canine.n.02

{
  "responseHeader":{
    "status":0,
    "QTime":6},
  "response":{"numFound":12,"start":0,"docs":[
      {
        "id":"chordate.n.01"},
      {
        "id":"vertebrate.n.01"},
      {
        "id":"mammal.n.01"},
      {
        "id":"placental.n.01"},
      {
        "id":"entity.n.01"},
      {
        "id":"physical_entity.n.01"},
      {
        "id":"object.n.01"},
      {
        "id":"whole.n.02"},
      {
        "id":"living_thing.n.01"},
      {
        "id":"organism.n.01"}]
  }}

Notice we stopped at 10, instead of returning all 12 numFound.

What are all the “domestic_animal”s in the index?

In this query I added rows=1000 just to see what would happen. I’ll omit the bulk of the response…

{!graph%20from=%22id%22%20to=%22hyponyms%22%20returnRoot=%22false%22%20returnOnlyLeaf=%22false%22%20maxDepth=-1}id:domestic_animal.n.01

{
  "responseHeader":{
    "status":0,
    "QTime":14},
  "response":{"numFound":213,"start":0,"docs":[
      {
        "id":"feeder.n.01"},
      {
        "id":"stocker.n.01"},
      {
        "id":"head.n.02"},
      {
        "id":"puppy.n.01"},
      {
        "id":"dog.n.01"},
      {
        "id":"pooch.n.01"},
      {
        "id":"cur.n.01"},
      {
        "id":"feist.n.01"},
      {
        "id":"pariah_dog.n.01"}, 
    ...
      {
        "id":"burmese_cat.n.01"},
      {
        "id":"egyptian_cat.n.01"},
      {
        "id":"maltese.n.03"},
      {
        "id":"abyssinian.n.01"},
      {
        "id":"manx.n.02"}]

And that’s just the beginning

The hierarchical nature of these queries means we can change one of the intermediate nodes and have radically different results. Take, for instance, the RBAC use-case I mentioned at the beginning. By simply adding an operation to a permission, immediately all of the roles that have that permission and all of the subjects that have those roles gain access to that operation. Likewise, catalogs can be reorganized by shifting one subcategoy to another without reindexing all of the products within them.

Solr 6.0 is going to be a fun release!

[1]: Princeton University “About WordNet.” WordNet. Princeton University. 2010. http://wordnet.princeton.edu




More blog articles:


We've been Solr-istas since day one!

Our founder wrote the first book on Solr, now in 3rd edition. We've helped organizations from the US Patent and Trademark Office to Cisco build smarter search solutions with Solr.

Learn More about our Solr services.