• When click scoring can hurt search relevance — towards better signals processing in search

    Doug Turnbull — October 8, 2014 | 0 Comments | Filed in: solr

    Have you heard of “click scoring” or “click tracking”? In the context of search click scoring is the method whereby you collect statistics on where users click in their search results, then use that information to prefer that search result for the queried text. Consider, Virginia Decoded. A set of Virginia’s state laws For example, […]

  • Let’s Stop Saying “NoSQL”

    Doug Turnbull — September 27, 2014 | 22 Comments | Filed in: solr

    I say the word “NoSQL” a lot. When I say NoSQL, I tend to talk about denormalized and hierarchical document/row-based data stores like Cassandra, Mongo, Couch, or HBase. But its a terrible way to use that term. Because there are also graph databases that feel even more normalized than traditional relational databases. Then there are […]

  • Solving data “variety” with Postgres’s NoSQL Extensions

    Doug Turnbull — September 26, 2014 | 2 Comments | Filed in: solr

    Raise your hand if you’ve heard the three “Vs” of Big Data? Velocity — your query/updates are exceptionally fast or large. Your processing the entire twitter feed. Volume — you store a massive amount of data at rest. You’ve crawled the web and are storing the entire web in a database. Variety — The structure […]

  • The Semantic Web up and coming – impressions of SEMANTiCS 2014

    René Kriegler — September 19, 2014 | 0 Comments | Filed in: Conference, Natural Language Processing, Technologies

    When you hear someone say about a technology that ‘it only works in theory’, ‘it is too labour-intensive’ and ‘it is not industry-ready’, chances are that they are talking about semantic web technologies. As my experience has been different in a semantic search project for OpenSource Connections, I went to the SEMANTiCS 2014 conference in […]

  • September Chock Full of Talks (Dougtember?)

    Doug Turnbull — August 28, 2014 | 0 Comments | Filed in: solr

    I somehow managed to line up a speaking gig for every week in September! I hope you’ll join me on this insane marathon. I’ll be talking about topics key to what we care about at OSC: search as a datastructure, search relevancy, and search/big data at performance and scale. Don’t hesitate to *protected email* if […]

  • Introducing Splainer — The Open Source Search Sandbox That Tells You Why

    Doug Turnbull — August 18, 2014 | 0 Comments | Filed in: solr

    One piece of feedback that has consistently come with our Quepid search testing tool is the need to understand “why” search results come back the order they do. In plain English, what factors influence search the most? Why does my search engine think a document about “water bottles” is more relevant than “baby bottles” for […]

  • Using Quepid to Improve Relevancy of Advance Auto Parts Intranet Search

    Doug Turnbull — July 24, 2014 | 0 Comments | Filed in: solr

    Recently, Advance Auto Parts contacted OSC to improve the search relevancy of their intranet application, Starting Line. Starting Line serves as the knowledge base for every store employee, so having relevant internal search results helps keeps employees connected with resources and company news. Through our two day Quepid relevancy assessment, we helped bring together content […]

  • Improving The Camel Solr Component

    Doug Turnbull — July 15, 2014 | 0 Comments | Filed in: solr

    We’ve been using Apache Camel a fair amount recently as our ingestion pipeline of choice. It presents a fairly nice DSL for wiring together different data sources, performing transformations, and finally sending data to Solr. Using the normal Solr component, you can write code that looks like this: from(“file://foo?fileName=input.csv”) .unmarshall().csv() .split(body()) .to(“bean:convertToSolrDoc”) .setHeader(SolrConstants.OPERATION, SolrConstants.INSERT) .to(“solr://localhost:8983/solr/collection1”) […]

Developed in Charlottesville, VA | ©2013 – OpenSource Connections, LLC