Wednesday, June 18, 2008

Hakia - First Meaning-based Search Engine

Hakia, which is a "meaning-based" search engine startup getting a bit of buzz in my mind. It is a venture-backed, multi-national team company headquartered in New York - and curiously has former US senator Bill Bradley as a board member. It has also launched its beta version and already ranks around 20k on Alexa - which is impressive.

My Hakia experiments (along with Google)

The user interface is similar to Google, but the engine prompts you to enter not just keywords - but a question, a phrase, or a sentence. My first question was: What is the population of China?
As you can see the results were spot on. I ran the same query on Google and got very similar results, but sans flag. Looking carefully over the results in Hakia, I noticed the message: "Your query produced the Hakia gallery for China. What else do you want to know about China?"

At first this seems like a value add. However, after some thinking about it - I am not sure. What seems to have happened is that instead of performing the search, Hakia classified my question and pulled the results out of a particular cluster - i.e. China. To verify this hypothesis, I ran another query: What is the capital of china?. The results again suggested a gallery for China, but did not produce the right answer. Now to Hakia's credit, it recovered nicely when I typed in:

My next query was more pragmatic: Where is the Apple store in Soho? (another example from Hakia). The answer was perfect. I then performed the same search on Google and got a perfect result there too.

Then I searched for Why did Enron collapse?. Again Hakia did well, but not noticeably better than Google. However, I did see one very impressive thing in Hakia. In its results was this statement: Enron's collapse was not caused by overstated resource reserves, but by another kind of overstatement. This is pretty witty.... but I am still not convinced that it is doing semantic analysis. Here is why: that reply is not constructed out of words because Hakia understands the semantics of the question. Instead, it pulled this sentence out of one of the documents which had a high rank, that matches the Why did Enron collapse? query.

In my final experiment, Hakia beat Google hands down. I asked Why did Martha Stewart go to jail? - which is not one of Hakia's homebrewed examples, but it is fairly similar to their Enron example. Hakia produced perfect results for the Martha question:

Hakia is impressive, but does it really understand meaning?

I have to say that Hakia leaves me intrigued. Despite the fact that it could not answer What does Hakia mean? and despite the fact that there isn't sufficient evidence yet that it really understands meaning.

It's intriguing to think about the old idea of being able to type a question into a computer and always getting a meaningful answer (a la the Turing test). But right now I am mainly interested in Hakia's method for picking the top answer. That seems to be Hakia's secret sauce at this point, which is unique and works quite well for them. Whatever heuristic they are using, it gives back meaningful results based on analysis of strings - and it is impressive, at least at first.

Hakia and Google

Perhaps the more important question is: Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is no. This current version is not exciting enough and the resulting search set is not obviously better. So it's a long shot that they'll beat Google in search. I think if Hakia presented one single answer for each query, with the ability to drill down, it might catch more attention. But again, this is a long shot.

The final question is: Is semantical search fundamentally better than text search?. This is a complex question and requires deep theoretical expertise to answer it definitively. Here are a few hints....

Google's string algorithm is very powerful - this is an undeniable fact. A narrow focused vertical search engine, that makes a lot of assumptions about the underlying search domain (e.g. Retrevo) does a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to quickly determine the domain and then to do a great job searching inside the domain. This is an old and difficult problem related to the understanding of natural language and AI. We know it's hard, but we also know that it is possible.

No comments: