Nov
26
Filed Under (search engine) by M.Varone on 26-11-2009

Internet search engines have made some serious progress the past few years, from the first successes of Altavista and Lycos to the unmatched power (given its superior results) of Google. However, in the past two or three years, even Google has reached a kind of plateau; significant innovations are less and less and the competition (Bing in particular) is closing in faster than ever before.

Keyword technology (integrated with a series of statistical elements such as PageRank) has the enormous advantage of being simple, easily applicable to many languages and very fast. It has all of the characteristics which were crucial during the Web’s beginnings (when investments and processing power were much lower), but which are not so important today. When applied to the Web, keyword technology took advantage of the free and voluntary labor hours of hundreds of millions of people. People, who by searching and clicking on one or more results, provide makers with and enormous quantity of information each day. This kind of information is priceless and helps to re-organize search results in the best possible way (it could be looked at as the price users pay to use free services: with labor instead of money).

Nevertheless, the time has come to integrate this technology with something new. There is absolutely no need to throw away what was done in the past (in many cases, search results are already quite good). We just need to add on new technology to improve the currently problematic search results and make searching as simple as possible (especially for those who are unable to conduct an efficient search, but could easily formulate their question to a person).

We can’t be afraid to get our hands dirty. We need to get to the heart of the language and culture of every nation; up until now the approach has been very “sterile” and has stayed at a symbolic level, without really scratching the surface. In order to understand meanings, we need to go in-depth and understand that a text is comprised of phrases, concepts, attributes and relations which need to be analyzed as a whole (even on a cultural level). Only then can we succeed in capturing the content’s most important aspects and be able to respond to users’ searches in a timely fashion.

Significant investments will be necessary (each language is complex and differs from others and is often indivisible from a nation’s culture), as well as more manual labor paired with today’s greater processing power. These features will greatly accelerate the ongoing process, which will bring about an Internet search engine market led by two players: Google and Yahoo!

Smaller entities, whether already in existence or just starting-up, can still make their contribution, but only for innovative technological aspects or in vertical market contexts.  It will be near impossible for them to compete against these two giants on any other level. Semantic technology is still young and has much room to grow; we should not expect any miracles or major revolutions in the near future. The path to follow is long and tortuous, but in the end, the potential reward could be quite astounding.

Nov
10
Filed Under (Books & News Related) by M.Varone on 10-11-2009

The industry was in an uproar when Eric Schmidt stated that it will be necessary to switch from words to meanings, in order to better understand what users are asking and what is contained in indexed documents. It would be a considerable change in direction for the Mountain View giant, which has always sustained that keyword technology is more than sufficient to obtain the best results.In a way, it’s really nothing new. For some time now, in the world of Semantic Web, a sort  of  integration of semantic technology (which is able to understand meanings) has been going on within one of the most popular Internet search engines. When Bing was launched, Microsoft itself claimed to use semantic elements, but without actually specifying the types of elements and the ways they would benefit searches. However, just the fact that the industry leader is talking about ‘understanding meanings’, makes it legitimate and creates a time line of before and after: the era of widespread web-applied semantics has officially begun.

Nov
02
Filed Under (search engine) by M.Varone on 02-11-2009

When you search for something on the Internet, you always know which search engine you are using (Yahoo!, Bing and Google are the most popular), but when you search for something at work, sometimes you have no idea where the information comes from. You really don’t know which system you’re using, you just limit yourself to typing your request in the search box provided and hope to get the answer you were looking for.

The strange thing about this is that searching for information is a key activity in every company. Still, this market has not yet been tapped by the multinational software giants because it seems like they just can’t get their act together. Autonomy, the leading producer of company search solutions, is practically unknown to non-specialized personnel. Oracle and IBM play a small role and Microsoft actually had to purchase the Norwegian company, Fast, in order to try to grow in this sector. Google has a good share of the market thanks to its brand name, but its product does not provide results to top the competition. Not only, but users are also giving negative feedback (this goes for all of the key players) on result quality and search times. Thus, we have a complete picture of a situation which must be addressed if we want to try to beef up companies’ efficiency.

The most promising technology is semantic technology. Although it hasn’t yet reached its maximum potential, it is already able to better “understand” content and identify the most important concepts and relations. We must also take note of the fact that it is impossible to have totally automatic solutions which magically know how to program themselves (an idealistic goal). The search engines must be developed around the knowledge and terminology used within the companies; if done in the right way, the task won’t be too complex, but its value will be priceless.

Change must occur in the technology used to analyze the content in the various types and forms of company documents. All of the above-listed search engines still use the old keyword technology, which has been strengthened throughout the years by statistical elements. This technology has the advantage of being stable and easily adaptable to different languages, but it also very limited because it cannot, in any way, understand the language nor the actual context of a text.

Different companies, such as mine, already offer search solutions based on models which I’ve just described (as in technology and methodology) and the results are quite interesting. It is probably just a matter of time before the big names decide to move in the same direction.