Good news! We're proud to announce our test-driven search toolbench Quepid now supports Elasticsearch. Quepid helps by bringing test-driven principles to tuning search results -- what we call Test-Driven Relevancy. It helps define what good search results by incorporating your own business expertise from colleagues that know users the best
Notes from pulling and pushing data in Solr using Spark and DataStax Enterprise
Returning for another year, John, Matt, Eric, and Chris will all be at Cassandra Summit, sharing war stories from our C* projects for the Federal Government and Commercial clients
I love stringing together custom analyzers to solve my search problems. Analyzers control how search and document text are transformed, step-by-step into individual terms for matching. This in turn gives you tremendous low-level control of your relevance. Yet one thing has always bugged me with Elasticsearch. You can't inspect the step-by-step behavior of an analyzer very easily. You have the _analyze API, which helps a great deal see the final output of the lengthy analysis process. But you can't pry into each step to see what's happening.
In Chapter 4 of Relevant Search, we talk a LOT about Elasticsearch analyzers. Without analyzers, your search engine would be a rather unintelligent string comparison system instead of a smart, powerful search engine. Analyzers are the text-processing pipeline that feed the search engine's core data structures, controlling whether two tokens (basically words) match during a search.
When I first got involved in search work, I noticed a fairly shocking shortcoming: improving the quality of search results is an abysmal experience. Despite the fact that search drives the user experience of many apps, it feels miserable to work with.
Joe is presenting a talk about logging with Logstash, Kafka, and Elasticsearch.
Starting in the summer of 2015, users can create their own scorer to accommodate any scale of ranking results and make their own custom scoring algorithm that that work best for their situation.
Quepid has added Organizations to make it easier to collaboratively solve search relevancy problems with your team!
We will be sharing war stories about building always on always available discovery systems using Cassandra
AWS CLI documentation only covers using JMESPath result queries briefly. Let's explore how much more you can do.
Do you know about Splainer? It's our handy-dandy, free and open source tool for working with Solr search results. It's become my favorite go to tool for tweaking a specific Solr query. Let's face it: nobody likes working with Solr in their browser's URL bar. It's a royal pain.
The release of Quepid v0.2.0 (July 3, 2015) added several new features as well as enhanced some existing features. The Release Notes below provide a quick look to whet your appetite. Individual posts detailing the how Organizations and Custom Scorers work are coming soon!
It’s time to fill out your timesheet, again. You’ve put in a full week of work but remembering everything you’ve accomplished can be difficult when you’re jumping between projects. What if you could just quickly copy your git commits for the week and be done?
Make the most out of your location data by using OpenLayers to provide a visualization.
Trying to answer hard policy questions like the impact of Pre-K attendance on 8th grade graduation rates? VLDS is your friend.
We're pleased to announce that Chapters 4 and 5 are available for early access for Relevant Search! Please read and give us feedback. This is early access for a reason: we want to hear what you think!
I often want to intercept the Solr docs in a format I can use offline. Clients have complex ingestion systems. I shouldn't need to have the full ingestion apparatus to do some Solr work. With documents offline, I can script something simple and stupid that throws documents at Solr to test my search relevancy work without having the full system at hand to populate Solr.
VLDS is the Virginia Longitudinal Data System, providing educational and workforce training data to improve public education. Eric will be looking at how we can help improve educational outcomes.
Something amazing happened today on our Quepid project. We did a code review. Instead of trying to extract value from reviewing pull requests in isolation, we realized actually talking to each other was the only way to move the dial on understanding each other's work.
Takeaways from Cassandra Day DC
We will be sharing stories about our use of Cassandra with Federal and State Goverments
Angular or EmberJS. Not long ago, the answer was Dojo!
BioSolr is being developed by Flax in conjunction with the European BioInformatics Institute. We’ve done a great deal of work in life sciences with search, and we frequently find organizations solving the same sorts of problems over and over and over. For this reason, I was really excited to compare notes with Flax and EBI about common themes encountered in life science search
Doug will be talking about why ever data hacker should care about the magic inside a search engine and his new book Relevant Search!
Joe Lawson is an experienced DevOps hacker and the newest member of OpenSource Connections. Find out who is this guy and how can he help you?
Chris will be speaking at Spark Summit 2015 on "Lessons Learned with Spark at the US Patent & Trademark Office"
Doug will be talking about Test Driven Relevancy, Quepid, and his new book Relevant Search!
Apache Camel is very powerful, but once you have a couple of routes, keeping track of what they are doing gets to be harder. Plus, you want to know what messages are in flight. Hawt helps.
Elasticsearch cross-fields are a great feature. They let you blend multiple fields scores together on a search-term by search-term basis. I covered the motivation for cross-field queries in a previous blog post. In this blog post I want to dive a layer deeper. How exactly does cross fields work? How can you tune its behavior?
Perhaps the biggest relevance mistake you can make is to take content, straight from it’s source, and plop it directly into Elasticsearch or Solr unmodified. If you don’t think about how your data is likely to be searched
I recently had to debug Solr 5 to help answer some client questions. With Solr 5, there's been several fundamental changes to the Lucene/Solr codebase. My previous methods of debugging Solr didn't work anymore.