Blog

OSC has 9 speaking engagements the Fall (and counting)

Just in case youre interested, Id like to add 6ish more speaking engagements to the list of speakers were taking to Dublin this November.

Database History from Codd to Brewer and Beyond

Doug Turnbull, NoSQL Matters Barcelona

There are innumerable technical lessons to learn from database history. Its easy to go with what’s new and trendy. Its harder to appreciate technical reasons why one approach suddenly became more favored than another. History highlights the limitations and power behind database solutions. If we don’t learn from history we are doomed to repeat it: – What were the first databases like (Codasyl, etc)? Why did they start out this way? – Why was RDMS the right technical response to the non RDMS databases back in the day? – Why was the move away from RDMS to NoSQL the right technical solution for many problems today? A great introductory to the basic technical scaffolding and historic context for NoSQL, from this talk, you’ll have a deeper appreciation of the transition from vertically scaling Big Metal to horizontally scaling Big Data.

One Million Books: Adventures in Discoverability with Cassandra and Solr

patricia-gorla, Cassandra Summit Europe

For any venture, storing your data is just the first step in making sense of it. How do you make your system discoverable? How do you tune your relevancy to accommodate real-time updates? In this session, we explore pairing Cassandra with Solr using Datastax Enterprise Search, and look at different search paradigms to help your users find patterns in your data.

An Introduction to Real-Time Analytics with Cassandra and Hadoop

patricia-gorla, Strata+Hadoop World NYC

Cassandra is a distributed storage system for managing lots of structured data over many commodity servers, while providing a highly-available service with no single point of failure.

Put another way, Cassandra is a solution to scaling out relational databases to the terabyte scale.

Cassandra’s append-only structure makes it a perfect HDFS alternative to perform large scale mapreduce analytics on real-time data.

In this session, we will cover:

  • Introduction to Cassandra
  • Setting up a Cassandra Cluster with MapReduce
  • Pros and Cons of using Cassandra
  • Failure Mitigation: How to recover lost nodes
  • Performance Tuning

At the end of the tutorial, participants will have set up a multi-node Cassandra/Hadoop cluster, indexed data into the cluster at high volumes, and run analyses against the cluster.

Getting Started with Lucene and Solr

John Berryman, All Things Open

From intra-website search to full-on e-commerce applications, full-text search is ubiquitous on the web. And in the domain of search, Solr is arguably the most widely deployed search engine in the world. In this fast-paced session I will define the problem of full-text search and then introduce Lucene, the general-purpose software library upon which Solr is built. I will then demonstrate how Lucene may be used to build a basic search engine. After this, I will introduce Solr search engine. Solr can be though of as the best practices implementation of a Lucene search index wrapped inside a web server. I will present a wide variety of capabilities that are available with Solr right out of the box and I will demonstrated how easily Solr may be configured to meet various search needs. Attendees will leave with an understanding of the basic principles required to develop and deploy a full-text search engine to meet their own specific search needs.

Understanding How CQL3 Maps to Cassandras Internal Data Structure

John Berryman, Planet Cassandra

CQL3 is the newly ordained, canonical, and best-practices means of interacting with Cassandra. Indeed, the Apache Cassandra documentation itself declares that the Thrift API as “legacy” and recommends that CQL3 be used instead. But I’ve heard several people express their concern over the added layer of abstraction. There seems to be an uncertainty about what’s really happening inside of Cassandra.

In this presentation we will open up the hood and take a look at exactly how Cassandra is treating CQL3 queries. Our first stop will be the Cassandra data structure itself. We will briefly review the concepts of keyspaces, columnfamilies, rows, and columns. And we will explain where this data structure excels and where it does not. Composite rowkeys and columnnames are heavily used with CQL3, so well cover their functionality as well.

We will then turn to CQL3. I will demonstrate the basic CQL syntax and show how it maps to the underlying data structure. We will see that CQL actually serves as a sort of best practices interface to the internal Cassandra data structure. We will take this point further by demonstrating CQL3 collections (set, list, and map) and showing how they are really just a creative use of this same internal data structure.

Semantic Search and Recommendation with Solr Search Engine

John Berryman, Semantic Technology & Business Conference

Built upon Lucene, Solr provides fast, highly scalable, and easily maintainable full-text search capabilities. However, under the hood, Solr is really just a sophisticated token-matching engine. Whats missing? First, Solr lacks semantic search. If youre looking for documents about “software architecture” then it might be appropriate to retrieve documents about “programming design patterns” even if they dont explicitly contain the terms “software” or “architecture”. Semantic search allows users to find documents by meaning rather than just by simple token matching. Second, Solr lacks the ability to make rich recommendations. If most customers who purchase cameras also tend to purchase tripods, then it is a good idea to recommend tripods when a new customer is purchasing a camera! So-called “recommenders” provide users with recommendations based upon the previous behavior of other similar users. In this fast-paced discussion I will describe the nature of token-based search and outline the need for semantic search and recommendation. Then, I will provide an audience-friendly, mathematical demonstration of how token-based search can be augmented so that both semantic search and recommendation are possible. Finally I will also demonstrate our ongoing work using Mahout to equip Solr with both of these useful capabilities.

… and then of course theres the Solr training

Scott Stults and Matt Overstreet