Visualizing Solr file ingestion with a punchcard graph in R

Jody White — March 26, 2015

Recently we've had to analyze the size of files being ingested into a Solr index. Performance testing had been done several times and we were seeing some really great response times with zero errors and other times we were seeing really high response times with hundreds of 504 Server errors.

Going Cross-Origin with Solr

Christopher BradfordMarch 26, 2015

It is becoming more common to connect directly with a Solr cluster from rich client side applications. Performing a search directly against the cluster will require either JSONP or Cross-origin Resource Sharing (CORS). Here we discuss a few methods for connecting with a search resource with CORS.

Elasticsearch Cross Field Search Is A Lie

Doug TurnbullMarch 19, 2015

In Elasticsearch, searching across multiple fields can be confusing to beginners. This is a tough first step in creating a relevant search solution, so it's important to get this right. In particular, it can be hard to wrap your head around multi_match's cross_field search type and where exactly it fits in to a querying strategy.


Mar 09-11

Daniel will be at Elastic{ON}15. He builds client-side search applications for Elasticsearch

Strata Conf 2015

Feb 17-20

Doug will be talking about "Database History from Codd to Brewer and Beyond"

Do you really need a "NoSQL" Database?

Doug TurnbullFebruary 2, 2015

I’m fortunate enough to have been selected to speak at Strata 2015 in San Jose on one of my favorite topics, database history. My talk is about the tradeoffs between going with a highly denormalized NoSQL database vs a normalized relational database.

A First Look at VisualOps

Scott StultsJanuary 22, 2015

VisualOps looks to be a great time-saver for managing AWS architecture, and it scratches an itch I've been having for quite a while.

Ad-hoc Solr Monitoring

Jody White — January 16, 2015

Hacking together Solr monitoring using Easy Auto Refresh (Chrome Plugin) and the command line

Using SolrJ CloudSolrServer and retrieving JSON

Eric PughJanuary 8, 2015

SolrCloud gives you HA capabilities for your Solr setup, but currently only the SolrJ client supports SolrCloud natively, and it returns Java objects. Here is how to return JSON formatted results instead.

Quepid: Write Tests Against Your Search Results

Doug TurnbullDecember 9, 2014

Quepid is our “Test Driven Search Relevancy” workbench product actively used by several clients. What do we mean by test-driven relevancy? We want to give you the ability to iterate quickly when creating a search solution. Sometimes the correctness of search results is fuzzy — based on how users or domain experts grade search results. Quepid has supported this since day one.

Apache Sentry. So close, and yet nothing.

Eric PughDecember 2, 2014

Security, it’s always been the bug a boo of Solr. There is a wide sense that security isn’t a concern of the Solr community, and that isn’t quite accurate. How to secure Solr is pretty simple. It’s just that there isn’t any one “blessed” approach that is wrapped into the codebase as each organizations needs are different.

Stepwise Date Boosting in Solr

Doug TurnbullNovember 26, 2014

When you want to boost on recency of content (ie more recently published documents before older ones), the Solr function query documentation gives you a basic date boost:

Two Search Conferences in Two Weeks Was Too Informative

Eric PughNovember 25, 2014

This year I experienced the conference equivalent of a lunar eclipse: two search conferences in two weeks located two hours away from my home town of Charlottesville, Virginia! Enterprise Search Summit (ESS) and LuceneRevolution (LR) share many similarities. Both have changed their names in the last year, Enterprise Search Summit expanding it’s focus to be *Enterprise Search & Discovery Summit* , and LuceneRevolution billing itself as the *Solr/Lucene Revolution*! Ironically, both still use their original domain names. Both are overlapping more in the focus on open source search, with Solr and ElasticSearch being frequent topics of conversation at ESS.

Playing with Thoth

Eric PughNovember 25, 2014

At LuceneRevolution last week, one of the sessions that got me really excited was about Thoth, presented by Damiano Braga and Praneet Mhatre. It was very nicely done, especially considering a 30 minute timeslot! Thoth is a new Solr monitoring solution open sourced by Trulia.

All Things Open

October 22-23

Doug will be talking about How I learned to stop worrying and love the SQL -- converting Quepid from Redis to MySQL

Let’s Stop Saying “NoSQL”

Doug TurnbullSeptember 27, 2014

I say the word NoSQL a lot. When I say NoSQL, I tend to talk about denormalized and hierarchical document/row-based data stores like Cassandra, Mongo, Couch, or HBase. But its a terrible way to use that term. Because there are also graph databases that feel even more normalized than traditional relational databases.

Solving data “variety” with Postgres’s NoSQL Extensions

Doug TurnbullSeptember 26, 2014

Raise your hand if you’ve heard the three "Vs" of Big Data? Velocity — your query/updates are exceptionally fast or large. Your processing the entire twitter feed. Volume — you store a massive amount of data at rest. You’ve crawled the web and are storing the entire web in a database. Variety — The structure of records varies dramatically.

The Semantic Web up and coming – impressions of SEMANTiCS 2014

René Kriegler — September 19, 2014

When you hear someone say about a technology that ‘it only works in theory’, ‘it is too labour-intensive’ and ‘it is not industry-ready’, chances are that they are talking about semantic web technologies.

DC Solr/Lucene Meetup

September 18

Doug will be talking about 'Hacking Lucene for Custom Search Results'.

Recap of Cassandra Summit 2014

Christopher BradfordSeptember 17, 2014

OpenSource Connections was well represented in San Francisco at this years Cassandra Summit 2014. We had Chris Bradford, Eric Pugh, and Matt Overstreet in attendance for the training, sessions, and networking events.

New York Solr/Lucene Meetup

September 10

Doug will be talking about 'Test Driven Relevancy-How to Work w/Content Experts to Optimize Search Relevancy'.

Cassandra Summit

September 9-12

Matt, Eric, and Chris will all be at Cassandra Summit, sharing war stories from our C* projects for the Federal Government and Commercial clients

September Chock Full of Talks (Dougtember?)

Doug TurnbullAugust 28, 2014

I somehow managed to line up a speaking gig for every week in September! I hope you’ll join me on this insane marathon. I’ll be talking about topics key to what we care about at OSC: search as a datastructure, search relevancy, and search/big data at performance and scale.

Introducing Splainer — The Open Source Search Sandbox That Tells You Why

Doug TurnbullAugust 18, 2014

One piece of feedback that has consistently come with our Quepid search testing tool is the need to understand “why” search results come back the order they do. In plain English, what factors influence search the most? Why does my search engine think a document about “water bottles” is more relevant than “baby bottles” for a search about “milk bottles”?

Improving The Camel Solr Component

Doug TurnbullJuly 15, 2014

We’ve decided to make dramatic improvements to the Apache Camel Solr component! You can find our improvements here ready for production use (specifically this pull request)! What’s been done out of our wish list above?

Reindexing Collections with Solr’s Cursor Support

Doug TurnbullJuly 13, 2014

When a Solr schema changes, us Solr devs know what’s next — a large reindex of all of our data to capture any changes to index-time analysis. When we deliver solutions to our customers, we frequently need to build this in as a feature. Many cases, we can’t easily access the source system to reindex. Perhaps the original data is not easily available, having taken a circuitous route through the Sahara to get to Solr. Perhaps the sys admins don’t want us to run a nasty SQL query with 15 joins to pull in all the data.

Quepid : Athena Release

Jonathan ThompsonJuly 11, 2014

As the newest full time developer working on Opensource Connection's search relevancy tool, Quepid, I'm happy to announce that our newest release, codenamed 'Athena', is now live. This release is the first in a series named after Greek figures in mythology that aims to add powerful new features for our tool.