OSC partners to improve Wikipedia's search relevance
Realizing both teams were working in parallel a on machine-learning plugins for Elasticsearch, OSC and Wikimedia joined efforts to develop the Elasticsearch Learning to Rank (LTR) Plugin. The Elasticsearch LTR project proves orgs can deliver the highest gains when they don't rely on black-box, proprietary search vendors. With open source machine learning methods, each team can create the most optimal solution to their needs.
Armed with the Elasticsearch LTR plugin, OSC and Wikimedia improved relevance of Wikipedia site search. As Erik Bernhardson, Wikimedia Discovery Engineer, noted after the first deployment:
Learning to Rank increased our search relevance 20-40% across Wikipedia
OSC and Wikimedia continue to work hand-in-hand on the Learning to Rank plugin, with contributions from Yelp, Snag, and other organizations.
Snag (formerly Snagajob)
Snag invests to become search relevance leaders
Snag, the largest hourly job site in the US, needed a new strategy to match users and jobs. Snag worked with OSC to tailor matching solutions to their unique market. OSC helped grow Snag's search team's capabilities via short-term implementation, mentorship and training. Snag wanted to be an innovator in the search and relevance space: being the key, early sponsor of the Elasticsearch Learning to Rank Plugin.
OSC is proud to see Snag become leaders in the space - with Snag no longer depending on OSC's consulting services to improve search. Jason Kowalewski, Snag Senior Director of Engineering noted these results shortly after OSC's engagement:
We’ve seen a net 32% increase in conversion metrics across our historically lowest performing use-cases.
U. of Michigan Health System
OSC Implements Best-in-Breed Relevant Physician Finder Search
OSC nailed it.... They created an innovative approach that helps avoid relevance and search UX problems on classic Drupal search apps.
OSC Increases CareerBuilder's Application Rate 3% in a short engagment
We can't recommend OSC enough if you depend on search and want to make a significant impact on your business.
USPTO – Global Patent Search Network
The USPTO had signed a MOU with the Chinese Patent organization to share millions of patents from China and needed to post them online quickly.
With the scale of the data OpenSource Connections knew that implementing a solid search interface and ingestion method would squeeze the teams ability to build a webstack that could scale to traffic spikes. "Cloud meets Ocean" to the rescue! The cloud, in this case Amazon's AWS services, allowed us to easily scale the data ingestion and search server platforms to meet demand.
We recently wrapped up a project with Cisco that turned out to be one of our most interesting and challenging projects to date. While we can not disclose some of the most interesting details of our project, the work entailed pushing Solr to throughput limits that we had not previously witnessed. In addition, the numerical analytics focused use-case was significantly outside of what we normally see with Solr. Solr has become a cornerstone for full-text search, but it turns out to be very fast and well-suited as a general analytics engine!
Here’s what Cisco had to say about our work:
Thanks to OpenSource Connections for so quickly picking up our requirements and coming up with multiple plausible solutions. We really appreciated your focused approach to problem solving. I know that “search” per se is a wide field but you helped us narrow it down to our context quite effectively with Solr.
Inova Solutions needed a real time analytics solution that could pump through lots of multi-origin, live streaming data. It also needed to run easily all over the place, from information kiosks to users desktops.
HTML is the obvious choice to "run almost anywhere" and look good doing it. The catch was making sure our web app could handle lots of data going in, and lots of clients polling constantly for data coming out.
Cassandra is excellent for dealing with streaming data from lots of sources. If we can ignore consistency, C* can write faster than it can read!
For the web, two choices met our performance and framework requirements: Java and Go. While we considered using Python/Flask, we realized the nature of the number of web requests required a framework that could handle significantly more request/response cycles then traditionallly expected. Go was a great fit for a Python/Flask shop that needed to serve some serious requests per second.
The Rimm-Kaufman Group
The Rimm-Kaufman Group (RKG) provides a wide range of Data-Driven online marketing solutions to online businesses. RKG clients range from startups to the Fortune 500, and of late business has been growing at an ever increasing rate. This is great! But this also means that ever increasing pressure is placed upon their current MySQL-based infrastructure. Since RKG anticipates their clientele to continue growing larger, RKG decided that now is the time to take action and upgrade their infrastructure. And their technology of choice: Cassandra.
OSC helped us jumpstart our usage of Cassandra. We have been able to speed up our processing of certain data by a factor of 5 simply by moving the storage of some data from MySQL to Cassandra. We look forward to using Cassandra more!
Over the course of our two month project, OpenSource Connections aided RKG in building and deploying at production-ready Cassandra cluster. Highlights of the work are as follows:
- Upon reviewing RKG’s technical and business requirements, OpenSource Connections outlined the hardware specifications and defined the optimal cluster configuration. Based upon this information, new servers were purchased and assembled into the production Cassandra cluster.
- Part of the challenge when moving from SQL to Cassandra is getting your head around the new, and somewhat foreign way that Cassandra expects you to model data. OpenSource Connections worked closely with RKG developers to port their current solutions to Cassandra.
- Testing is almost as important as developing an appropriate data model. OpenSource Connections helped RKG design a Cassandra test framework that fit in well with their current MySQL testing framework.
- Before leaving RKG, OpenSource Connections provided the RKG tech team with a full-day training course in Cassandra development and systems administration. As part of the training, the development team brainstormed solutions to actual in-house development objectives.
By the time that OpenSource Connections left, RKG was up and running with their new 7-node Cassandra cluster – capable of storing terabytes of data and accepting more than 10K writes per second! But more important than this, RKG is now fully able to utilize and maintain their new Cassandra cluster on their own.