Mar
15
Filed Under (search engine) by L.Scagliarini on 15-03-2010

It hasn’t been long since the wireless industry decided to expand it’s high-speed wireless services, promising easy access to corporate data from a mobile phone (so called wireless data extensibility). The industry-wide goal is a potential factor of improvement for workforce effectiveness and efficiency in enterprises worldwide.

Progress can already be seen as handset manufacturers launch smaller and faster mobile devices with touch screens, keyboards and new operating systems for on-the-go access to content. With the infrastructure in place, one can expect end users to be easily connected to relevant content, thus fulfilling the wireless data promise. Unfortunately, the unique characteristics of mobile phones and their intelligent counterparts (smartphones) still pose a challenge when it comes to offering the ideal end user experience. For one thing, the latest evolutions in graphical user interfaces (GUI), such as those provided by Apple’s iPhone,  provide decent access to menus, content lists and cursory internet search, but they really just mimic the same experience users have when accessing applications from a regular computer. Secondly, due to the small screens, people who use the text messaging features to access data from a mobile phone require a user interface that’s simple, with a built-in Q&A mechanism providing just the minimal amount (possibly the only) information requested by the user. That would be smart.

This is why I believe that well-developed, high-performance, natural language interfaces are the ideal solution to enable users to access the information required. And from our personal experience alongside a leading company in the mobile communications market, we can safely say that users seem to agree. Just in the last few months, the service we developed for this company has been flooded with more than 100K questions daily. Real semantic technology in addition to a well-designed knowledge base is a recipe for success which can reach level of performance (90% of correct single answers) that is no match for any version of a generic keyword-based system.

Nov
26
Filed Under (search engine) by M.Varone on 26-11-2009

Internet search engines have made some serious progress the past few years, from the first successes of Altavista and Lycos to the unmatched power (given its superior results) of Google. However, in the past two or three years, even Google has reached a kind of plateau; significant innovations are less and less and the competition (Bing in particular) is closing in faster than ever before.

Keyword technology (integrated with a series of statistical elements such as PageRank) has the enormous advantage of being simple, easily applicable to many languages and very fast. It has all of the characteristics which were crucial during the Web’s beginnings (when investments and processing power were much lower), but which are not so important today. When applied to the Web, keyword technology took advantage of the free and voluntary labor hours of hundreds of millions of people. People, who by searching and clicking on one or more results, provide makers with and enormous quantity of information each day. This kind of information is priceless and helps to re-organize search results in the best possible way (it could be looked at as the price users pay to use free services: with labor instead of money).

Nevertheless, the time has come to integrate this technology with something new. There is absolutely no need to throw away what was done in the past (in many cases, search results are already quite good). We just need to add on new technology to improve the currently problematic search results and make searching as simple as possible (especially for those who are unable to conduct an efficient search, but could easily formulate their question to a person).

We can’t be afraid to get our hands dirty. We need to get to the heart of the language and culture of every nation; up until now the approach has been very “sterile” and has stayed at a symbolic level, without really scratching the surface. In order to understand meanings, we need to go in-depth and understand that a text is comprised of phrases, concepts, attributes and relations which need to be analyzed as a whole (even on a cultural level). Only then can we succeed in capturing the content’s most important aspects and be able to respond to users’ searches in a timely fashion.

Significant investments will be necessary (each language is complex and differs from others and is often indivisible from a nation’s culture), as well as more manual labor paired with today’s greater processing power. These features will greatly accelerate the ongoing process, which will bring about an Internet search engine market led by two players: Google and Yahoo!

Smaller entities, whether already in existence or just starting-up, can still make their contribution, but only for innovative technological aspects or in vertical market contexts.  It will be near impossible for them to compete against these two giants on any other level. Semantic technology is still young and has much room to grow; we should not expect any miracles or major revolutions in the near future. The path to follow is long and tortuous, but in the end, the potential reward could be quite astounding.

Nov
02
Filed Under (search engine) by M.Varone on 02-11-2009

When you search for something on the Internet, you always know which search engine you are using (Yahoo!, Bing and Google are the most popular), but when you search for something at work, sometimes you have no idea where the information comes from. You really don’t know which system you’re using, you just limit yourself to typing your request in the search box provided and hope to get the answer you were looking for.

The strange thing about this is that searching for information is a key activity in every company. Still, this market has not yet been tapped by the multinational software giants because it seems like they just can’t get their act together. Autonomy, the leading producer of company search solutions, is practically unknown to non-specialized personnel. Oracle and IBM play a small role and Microsoft actually had to purchase the Norwegian company, Fast, in order to try to grow in this sector. Google has a good share of the market thanks to its brand name, but its product does not provide results to top the competition. Not only, but users are also giving negative feedback (this goes for all of the key players) on result quality and search times. Thus, we have a complete picture of a situation which must be addressed if we want to try to beef up companies’ efficiency.

The most promising technology is semantic technology. Although it hasn’t yet reached its maximum potential, it is already able to better “understand” content and identify the most important concepts and relations. We must also take note of the fact that it is impossible to have totally automatic solutions which magically know how to program themselves (an idealistic goal). The search engines must be developed around the knowledge and terminology used within the companies; if done in the right way, the task won’t be too complex, but its value will be priceless.

Change must occur in the technology used to analyze the content in the various types and forms of company documents. All of the above-listed search engines still use the old keyword technology, which has been strengthened throughout the years by statistical elements. This technology has the advantage of being stable and easily adaptable to different languages, but it also very limited because it cannot, in any way, understand the language nor the actual context of a text.

Different companies, such as mine, already offer search solutions based on models which I’ve just described (as in technology and methodology) and the results are quite interesting. It is probably just a matter of time before the big names decide to move in the same direction.

Sep
07
Filed Under (search engine) by M.Varone on 07-09-2009

There’s a new Question Answering system (it’s something different from a search engine, as I wrote in the past) available online for testing: True Knowledge (it just needs a quick registration.)

From my point of view, the best way to test a system of this kind is by posing real questions to it, questions that, in the past, we have already searched the internet for an answer using standard search engines or other sources (especially Wikipedia.)  In fact, testing questions invented on the spot is not very useful (if not useless), because we may tend to follow the examples provided (too easy) or ask weird questions which no one would actually ask in a normal situation.

For this purpose, I’ve been collecting a list of about fifty questions (which I update from time to time.)  I know it’s not a very long list, but it’s carefully created, balanced and representative of both easy and difficult tasks to be carried out in this sector, from the point of view of an insider.

Even though I’m well aware of the huge complexity that must be faced in order to implement effective systems of this kind, I have to say that here the results are disappointing: only 3 out of the fifty questions obtain a correct answer (while a fourth question is not fully satisfying), with a percentage of 7%. As most questions are very simple (for example, “Who won the Nobel prize for Chemistry in 1999”), I was actually expecting something better (for the above mentioned question Google already provides the correct answer in the first link of the results.)  It’ s true that this is just a beta version, however even small variations of the suggested questions seem to invalidate the process, and this makes me doubtful on the soundness of the approach, and on the applicability in real situations.

When the system replies, it seems like magic indeed, but this happens so rarely that the magic disappears and what remains is the distinct sensation of a nice experiments but actually a useless tool.  The effort is remarkable, anyway and I will surely test the system again in a few months, but at this stage of development I have to say that unfortunately, the tool is not able to save us time, yet (and as for the future, we will wait and see.)