Indexing Millions of Documents using Tika and Atomic Update
On a recent engagement, we were posed with the problem of sorting through 6.5 million foreign patent documents and indexing them into Solr. This totaled about 1 TB of…
On a recent engagement, we were posed with the problem of sorting through 6.5 million foreign patent documents and indexing them into Solr. This totaled about 1 TB of…
Last month we found the best time to ask a question on StackOverflow using the oft-missed ‘join’ feature in Solr. Numbers will get you far in isolating the top-performing…
I’ve been working in some Python Solr client code. One area where bugs have cropped up is in query terms that need to be escaped before passing to Solr….
Welcome to the first blog post (of many, hopefully) by OpenSource Connections’ 2012 summer interns. After our first few weeks as interns at OSC, we have begun to get…
The United States government procures most of its goods and services through competitive bidding. The notices of its requirements are posted on a centralized website called FedBizOpps, www.fbo.gov. Contracting…
At OpenSource Connections, we focus on solving the search problems that others face. One of the problems that we have noticed is that tools for U.S. government acquisition officers…
When a user enters in a search query and is taken to a search engine results page, then the probability of clicking on a result is a function of…
Do you like Solr? Live in Las Vegas? Are you going to be in Vegas on Wednesday, February 22, 2012? Like free dinner? If this sounds like you, then…
As the “father of the Internet,” Tim Berners-Lee, got onto a plane to head to the burgeoning tech center of Tyler, Texas (all sarcasm intended), he must have wondered…
Recently, we had a project where we helped a client index a corpus of Chinese language documents in Solr. We have asked Dan Funk, a committer to Project Blacklight…
We are pleased to announce that Eric Pugh and David Smiley have published the second Solr book in publication, Apache Solr 3 Enterprise Search Server. It is an update…
Unless you have a very well-known brand where people type your URL to visit your site, then chances are that most of your traffic comes from an external search…