Hakia, a meaning-based search engine

Hakia is a relatively new search engine that wants to find and present search results in a new way.

The future of search is understanding information, not merely finding it. This is the claim of Dr. Riza Berkan, CEO of Hakia, a meaning-based or semantic search engine currently in beta. His motivation for plowing the field of ontological semantics is ultimately to compete with the giants of the search engine industry.

Pandia takes a look at Hakia to see what all this science yields in terms of search experience and relevance. We also talk to Dr. Berkan and Melek Pulatkonak, president and COO of Hakia, to get a lesson in "ontological semantics for dummies".

Hakia Galleries

Lets say I am curious about the renaissance scientist Johannes Kepler. If I search Hakia for Johannes Kepler, I get a presentation page from the Hakia Galleries, containing a picture of Kepler along with search results grouped in categories like Biography and Timeline, Awards and Accomplishments, and Speeches and Quotes.

This is a very convenient way to get your search results presented if you do a wide search like this, for instance if you are doing research.

We ask Melek Pulatkonak about the galleries: "Currently, the Hakia Galleries answer around 600,000 popular queries in various topics of interest. Let me give you a few examples: piano, Hillary Clinton, coffee, India, breast cancer, red sox, Paris Hilton, Pokemon… You get the idea. We are expanding the coverage every day."

Melek explains how the galleries are assembled: "Hakia galleries are distilled in a semi-automated process, a mixture of meaning-based technology and editorial process. Editors are involved in the automated gallery generation process as administrators."

"Their role ranges from checking, correcting, and removing items that are inappropriate. Note that humans are not involved in acquiring search results, it is all automated," she emphasizes.

Smart answers

What if you don't need a gallery of information, but something more specific? Say that I am having a bad day and want to know more about drugs to remedy my headache.

I ask Hakia "Which drug treats headache"?

In the search results sentences that contain an answer to my question are highlighted, like "aspirin has been used to treat migraine and other headaches" and "Nurofen is indicated for the relief of headache and back pain of musculo-skeletal origin".

So even before I have clicked on the search results, I have some of suggestions.

Fuzzy logic explained

Notice how the sentences in the search results don't contain the exact same words as my query. Either of the sentences contain the word 'drug' and one talks about relieving headache instead of treating headache. This is because Hakia uses fuzzy logic to expand my query.

Dr. Berkan explains: "Fuzzy logic means a flexible algorithm. The flexibility is used to take the original query and create its equivalent and enriched versions on the fly. The principles used in this process come from ontological semantics (see below for an explanation). The reason we are doing this is to bring search results from a variety of equivalent articulations of the search query and related concepts."

For example, the word "headache" is related to "migraine" or the word "treatment" is related to "cure". Without such enrichment, a search algorithm will stick to the word used, and will not be able to retrieve results from equally relevant material.

So if you search for "headache treatment" a search on Yahoo or Google will not return results regarding a cure for migraine, unless this phrase "headache treatment" is present on the relevant page, or someone has used this word when linking to it.

"We are the first search engine to introduce this ability to users, although the current beta version is not fully equipped with this capability yet," Riza adds.

Natural language search

Natural language search is one of Hakia's advantages. This simply means that you can pose a question to the search engine (e.g. "When was Abraham Lincoln born?") instead of breaking your question down into keywords (e.g. "abraham lincoln birth").

Natural language search also means that you can expect an answer to your question right on the search results page. Hakia will present search results which contain an answer to your question, not a list of web pages that might or might not contain an answer.

This is made possible by research in the intersection between the scientific disciplines of philosophy of language, mathematical logic, and cognitive science. This is called ontological semantics, a formal and comprehensive linguistic theory of meaning in natural language.

But what does this mean in layman's terms?

Dr. Berkan explains: "Ontological semantics is a branch of computational linguistics which is focused on meaning representation via ontological framework rather than grammatical (syntactic) approach. It provides computers with the ability to understand words in their context using principles similar to human understanding."

For example, "Kill the light" means turn off the light to humans. To an ordinary computer it might mean "end the life of the light." With ontological semantics, the verb "kill" can be interpreted as turn off or terminate, instead of murder.

"Linguists trained in the ontological semantics discipline and led by Professor Victor Raskin, have been compiling hakia.com's core ontological resources during the last two years," adds Melek Pulatkonak. These resources constitute the back-bone of semantic analysis and content understanding.

The long tail of search

Hakia's SemanticRank algorithm differs from popularity algorithms like Google's PageRank in that it determines a site's relevancy not by its popularity, but by cross referencing a network of meaning-related criteria in the text of the page.

We asked Melek to shed some light on this:

"Your description is quite correct. However, it is not the site's relevancy, rather the relevancy of the query to the content of the page, which is determined by popular vote (via link referrals) in Google. The critical breaking point in this equation is when the users' queries start to become longer than usual, unique, complex, and personal. When this happens, the 'popularity' reference point disappears."

"Queries like these are called long-tail queries, and there are zillions of them," Melek continues. "The real race in the search engine world is in the long tail. None of the leading search engines want to talk about this. They want to keep your attention in the domain where their popularity algorithms work.

You can't index meaning

"You can't index meaning," Riza explains. "You can only index words, addresses, and URLs."

"We have invented a new system called Qdexing, which is specifically designed for meaning representation. Qdex means query detection and extraction. This entails analyzing the entire content of a webpage, then extracting all possible queries that can be asked to this content, at various lengths and forms. These queries become gateways to the originating documents, paragraphs and sentences during the retrieval mode. Note that this is done off-line before any actual query is received from a user."

The reason for developing the SemanticRank and Qdexing technologies — the glittering prize — is improved relevancy and accuracy. Knowing that the relevancy of their search results was the prime reason for Google's rise to fame, this is certainly interesting. We'll be keeping an eye on Hakia.

Posted at: http://www.pandia.com/sew/507-hakia.html