Yet another competitor for Google is on the market: Cuil (a funny name in Italian :)) plans to fight Google by relying on the high number of indexed pages (120 billion stated so far), and on the experience of the founders, who used to work for Google (and maybe this is how they were able to raise 33 million dollars for their enterprise.)
I’ve tested this new engine and it doesn’t work so well: the innovation seems to be limited to an unusual representation of the results, while clustering remains similar to many others we’ve already seen in the past and, more important, the quality of the results appears to be poor. They could have called this version a beta but they didn’t, for some reason. Anyway, although the engine is not worth using right now, it is brand new and can only improve in the future.
At least they’ve avoided calling it a semantic search engine, and for this I am grateful, because nowadays everything pretends to be semantic, including coffee machines :-))
A final (playful) note: as Microsoft is buying all sorts of things (as long as they are connected with searching) maybe Cuil is just planning to be bought by the Redmond giant for a nice profit.
But what is categorization?
The question is not trivial, because there are different ways to indicate this activity, which seems to have inherited the confused eclecticism typical of Knowledge Management, and includes the large variety of labels such as “classification” and “clustering” and even going as far as some who use such linguistic monstrosities as “taxonomization”.
Personally I prefer “categorization”, because I believe it’s the term that best reflects the process behind the different names: distinguishing available information according to different categories to make searching easy and immediate.
Categorization is in most cases performed manually, and therefore tied to subjectivity, to individual choices depending on the way of thinking, on necessity etc., and also on the type of content (documents, emails, web sites, etc.)
There is no need to emphasize that, being a manual activity, categorizing presents two main problems: it requires a great amount of time to be performed, and normally produces subjective definitions of categories that different users may find incoherent. In order to solve these problems, in the development of technologies for information management, automatic applications were introduced.
The first categorization systems were born immediately after the first attempts to implement research applications, but only with the recent explosion of information, has the potential usefulness of automatic categorization become a major interest. We just need to consider the quantity of data available today on the web in comparison to a few years ago, our direct experience in the management of documents on our pc, or the phenomenon of email: less than 10 years have passed, and average users are no longer managing a few emails per week, but about 30 emails per day…
Typically, in the field of technologies for information processing (at least from the point of view of an insider), nearly all the researchers have approached the problem with the fixed idea of finding an algorithm that, with no or little manual work, can categorize any content automatically, and with a very high quality.
This is how a pragmatic approach to the problem was replaced by the silver bullet race of automatic categorization: an imprudence that has caused excessive expectations and unsatisfactory results. In the next posts we will see how, when and why.