Several of our recent projects have involved different methods of data sorting, manipulation, and visualization. An example of this is now available with the University of Virginia Librarys BlacklightDL project. One of the issues that we run into is determining similarity and trying to predict whether or not certain results will actually be similar to what is being searched for.
At issue is the balance between the engine interpolating what the searcher wants and, on the other side, making the searcher be as precise as possible with his search terms. On one end is a sample search with one term and a multitude of results (for example, a Google search for the term dog) and on the other end, a search for a long string of text. Most searchers dont want to spend their time either typing in the volumes of text or dont know exactly how to pull out the most pertinent pieces of what they are looking for.
Enter Sphere. Sphere looks at the entirety of the content of a page that you are looking at and searches for blog posts which contain similar information. Rather than looking simply at the links within a page, it searches based on the entirety of the comment. This provides enough detail to create more accurate results without forcing the user to enter in excessive (and potentially inaccurate and superfluous) information. Alternatively, the user can enter in the link of the page of interest to the Sphere search engine to get similar results based on the full text of the site. An example search yielded results of similar entries to this page.Now, if only Sphere could find blog pages similar to what Im thinking rather than to what Im reading, Id be happy!