    Doug Turnbull — October 8, 2014 | 0 Comments | Filed in: solr

    Have you heard of “click scoring” or “click tracking”? In the context of search click scoring is the method whereby you collect statistics on where users click in their search results, then use that information to prefer that search result for the queried text. Consider, Virginia Decoded. A set of Virginia’s state laws For example, […]

    Doug Turnbull — September 27, 2014 | 22 Comments | Filed in: solr

    I say the word “NoSQL” a lot. When I say NoSQL, I tend to talk about denormalized and hierarchical document/row-based data stores like Cassandra, Mongo, Couch, or HBase. But its a terrible way to use that term. Because there are also graph databases that feel even more normalized than traditional relational databases. Then there are […]

    Doug Turnbull — September 26, 2014 | 2 Comments | Filed in: solr

    Raise your hand if you’ve heard the three “Vs” of Big Data? Velocity — your query/updates are exceptionally fast or large. Your processing the entire twitter feed. Volume — you store a massive amount of data at rest. You’ve crawled the web and are storing the entire web in a database. Variety — The structure […]

    René Kriegler — September 19, 2014 | 0 Comments | Filed in: Conference, Natural Language Processing, Technologies

    When you hear someone say about a technology that ‘it only works in theory’, ‘it is too labour-intensive’ and ‘it is not industry-ready’, chances are that they are talking about semantic web technologies. As my experience has been different in a semantic search project for OpenSource Connections, I went to the SEMANTiCS 2014 conference in […]

    Doug Turnbull — August 28, 2014 | 0 Comments | Filed in: solr

    I somehow managed to line up a speaking gig for every week in September! I hope you’ll join me on this insane marathon. I’ll be talking about topics key to what we care about at OSC: search as a datastructure, search relevancy, and search/big data at performance and scale. Don’t hesitate to *protected email* if […]

    Doug Turnbull — August 18, 2014 | 0 Comments | Filed in: solr

    One piece of feedback that has consistently come with our Quepid search testing tool is the need to understand “why” search results come back the order they do. In plain English, what factors influence search the most? Why does my search engine think a document about “water bottles” is more relevant than “baby bottles” for […]

    Doug Turnbull — July 24, 2014 | 0 Comments | Filed in: solr

    Recently, Advance Auto Parts contacted OSC to improve the search relevancy of their intranet application, Starting Line. Starting Line serves as the knowledge base for every store employee, so having relevant internal search results helps keeps employees connected with resources and company news. Through our two day Quepid relevancy assessment, we helped bring together content […]

    Doug Turnbull — July 15, 2014 | 0 Comments | Filed in: solr

    We’ve been using Apache Camel a fair amount recently as our ingestion pipeline of choice. It presents a fairly nice DSL for wiring together different data sources, performing transformations, and finally sending data to Solr. Using the normal Solr component, you can write code that looks like this: from(“file://foo?fileName=input.csv”) .unmarshall().csv() .split(body()) .to(“bean:convertToSolrDoc”) .setHeader(SolrConstants.OPERATION, SolrConstants.INSERT) .to(“solr://localhost:8983/solr/collection1”) […]

