I will be giving for the fourth (and best!) time a talk on some experiences in making Solr part of a larger data ecosystem through using Apache Solr and Apache Zeppelin at the Washington DC, Maryland, and Virginia Hortonworks User Group Meetup. Here’s a talk synopsis:
Apache Solr powers search and navigation for many of the world’s largest websites. Solr is widely admired for its rock-solid full-text search and its ability to scale up to massive workflows. But Solr has moved beyond its roots as just a full-text search engine. Today, people use Solr for aggregating data, powering dashboards, geo-location, even building knowledge graphs! In fact, Solr is so powerful, it’s the standard engine for big data search on major data analytics platforms including Hadoop and Cassandra. Critical data is being accessed through Solr’s rich query interface and, now, big data engineers are including Solr as one more data store in the analytics processing chain. But, as we expand the data pipeline to include diverse data stores, we need consistent ways of working across different data access patterns and representations.
Enter Apache Spark. Apache Spark has seen a meteoric rise as the tool for big data processing. Spark makes distributed computing as simple as running a SQL query. Well, almost! Spark’s core abstraction, the Resiliant Distributed Dataset (RDD), is capable of representing pretty much any data store, including Solr. So, let’s see how we can integrate Apache Solr into our data processing pipeline using Apache Spark.
Finally, we’ll tie it all together with a new Apache project that marries the best of iPython Notebook (the favorite tool of data scientists) and the best of distributed computing (Apache Spark and SparkSQL). Apache Zeppelin is the interactive computational environment for data analytics. Just like iPython Notebook, Zeppelin supports collaboration, data exploration and discovery, and rich graphs and visualizations. But its deep integration with Spark means Apache Zeppelin is the “interactive analytics notebook” for Big Data.
Look forward to seeing you there! Be sure to RSVP.