Feb
08

As I have written many times before, semantic technology is unique in that it is able to go beyond the limits of other types of technology and approach the automatic understanding of a text. It is not perfect, however, and it certainly has yet to reach its maximum potential.

I realize that it’s not that easy for those who don’t work in the sector to understand (especially due to the fact that there are so many false promises out there, which tend to create unreasonable expectations, muddled ideas and market chaos). Therefore, it might be useful to use a common experience as an example, such as: our learning process.

Let’s start from the beginning: from the moment we (human beings) begin to talk, understand, learn, go to school, etc… We require at least 12-15 years to be able to read a newspaper and understand the most general articles and this is thanks to the experience we developed while learning the meanings of words and experimenting with a great deal of different phrase constructions. Consequently, the learning process is  lengthier when we decide to tackle more technical terms or specific topics.

Learning takes time, and the same goes for a computer. It’s true that a computer can process in nanoseconds while we think in milliseconds, but it is also true that our method of learning uses a device (the brain) that no one has been able to fully understand and that is able to do things that not even the most powerful computer can imitate.

In summary, it doesn’t make sense to expect that a computer be able to perfectly analyze and understand a biology text, for example, without first having learned all it can about that subject. There are no shortcuts nor magic formulas: learning a language is difficult and even automatic processes require time and labor.

When I present a company with our software solutions (which are based on a semantic technology that uses a rich and vast semantic network), I find myself in front of an audience who clearly understands the advantages of this approach.  Yet, the series of concerns and doubts they raise often clouds the decision-making process and causes an incorrect evaluation of the actual return on investment.

Whether they are raised by IT managers, KM workers or software developers, the concerns fall into two categories: the first, the costs related to the setup and maintenance of the semantic network and the second, the costs related to the infrastructure required to maintain a performance level able to satisfy operations.

There are many reasons behind these concerns, but two factors seem to stand out. On one hand, there are the excellent (and often incorrect) communication activities carried out by the makers of systems based on keyword technology.  They have almost succeeded in convincing the market that a complex problem such as information management can be solved with automatic shortcuts and that any other alternative would be unaffordable. On the other hand, the majority of researchers in this sector are  still skeptical about systems which are entirely semantic. This is mainly caused by their inability (at least up to now) to develop software which can combine the advantages of increased text comprehension with performance in order to meet the demands of the real world (thus further strengthening the position of the competition.)

In the past ten years, many successful projects have been developed using our semantic technology. Therefore, I think it would be useful to use real data from our everyday experiences to help clear up the misconceptions which often cause people to make irrational decisions.

Costs of development

To add a new language to Cogito, two man-years of software development and 8-10 man-years of linguistic development are needed in order to refine the semantic network. You can quickly estimate the cost of such resources  (if you are in the Silicon Valley, divide your estimated total by 2!) and immediately understand that the initial investment is considerable, yet affordable considering the cost will be spread over all the implementations that will be done over time.

Cogito’s standard semantic network permits a horizontal management of content so that a significantly higher rate of precision e recall (compared to that obtained from a static system) is obtained with no need for further elaboration. For vertical implementations, start-up costs will be necessary so that a standard semantic network can be enriched with knowledge from a specific dominion (the number of added concepts usually does not exceed 5,000); usually 20-30 working days are needed for a linguist to complete this task.

For those who believe that “languages constantly change and adding new terms can be costly,” may I  remind you that even the most dynamic languages, such as English,  increase by no more than 100-200 new terms (of common use) and less than 1000 non-idiomatic expressions  per year  (in the worst case scenario, this could mean about 10 working days per year.)

Those who criticize the complexity of managing a semantic network often refer to the complexity of managing lists of entities such as: people, places, companies, organizations, etc.  Traditional systems are able to recognize an entity only if it is present in a list; this aspect is often  erroneously confused with semantic network management.  A good semantic engine is able to recognize an entity based on the semantic role it plays within a text, therefore it does not require the creation nor the maintenance of lists. At the same time, it is also able to correctly recognize  less frequent entities (which, for obvious reasons, have not been inserted in the list.)

Costs of infrastructure

Cogito can analyze more than 120KB of text (circa 40 pages of text) per second with a common single-processor server. This kind of speed, combined with its linear scalability and low cost, makes Cogito a  practical solution even in situations in which large quantities (tens of millions) of documents must be analyzed.

The development and maintenance costs of a semantic network are considerably lower than what is commonly assumed; the improvements in terms of the ability to manage information (even when very complex) are obvious even to those who are not experts in this sector. I am convinced that when these aspects can be objectively analyzed (when myths and obsolete information are ignored), the number of companies which adopt real semantic solutions will increase.

Jul
30
Filed Under (knowledge Management) by M.Varone on 30-07-2009

I find the ongoing battle between Google and Microsoft to be very educational (and honestly, very entertaining). It is obvious that the companies are becoming more and more alike. They have similar strengths and weaknesses which tend to surface in different contexts and are now beginning to overlap each other.

Both companies are monopolistic and earn incredible amounts of money by using their respective powerful positions. Both share the same problem (a problem I’d sure like to have :-)), which is, trying to find new markets able to guarantee a continuous increase in sales and revenue.

On one hand, we have Microsoft with its operating systems and office applications, and on the other, we have Google with its search engine. Both companies have reached such a high level of market penetration that they can only, inevitably, go down from here (actually, this consideration is especially aimed at Microsoft, but it might just be a matter of time before the same thing happens to Google).

Both Microsoft and Google continuously strive to add new functions to their cash cows, unfortunately the end results of these efforts are not very significant. Just look at the negativity surrounding Vista (many users continue to prefer Windows XP, even if it is eight years old) and what about Google - there haven’t been any real innovations in searches for years (in fact, it seems that Google’s functions are getting worse by the day). It is also difficult to convince people to switch over to the new version of Office, even if it is better than its predecessor; 95% of users claim that it doesn’t really provide anything new or more efficient. I think this a perfect example of how, in many sectors of software development, we have arrived at a sort of plateau: the returns on investment for new and improved products are always less and we are getting to the point that it is becoming very difficult to justify such investments.

It must also be said that software houses are also “doomed” to keep releasing new versions, because unlike other products, software doesn’t break, isn’t prone to wear and tear and isn’t influenced by the latest fashions. The problem with new releases is that they often require a lot of time and resources, however, they don’t give significant improvements in return.

The prospects for key products like Word or PowerPoint are slim, and Google isn’t doing any better. The fact that Google is releasing immature programs like “Google Squared” and programs for a very limited user base like “Google Timeline”, shows that if no real changes occur on the scene, there will be no space left for innovations which are actually efficient.

Thus, Google has been trying to gain market share in Microsoft’s territory with its efforts to make applications such as Office useable online and by combining the browser with the Google Chrome operating system. Likewise, major investments by Ballmer and associates are aimed at taking away the market share from Google’s search engine (with Bing, its predecessors and its future successors).

Even the efforts not directly aimed at the enemy haven’t been very successful: for example, Google, which up until  2-3 years ago seemed an invincible juggernaut, has recently produced more failures (Google Base, Knol, Video and Google Print Ads) than winners.

Having said this, I realize that these software giants are constantly under pressure by the international financial community to keep growing, and deciding what to do is not an easy task.
If they reduce the investments in their operating fields, they are accused of not taking the future into account and therefore risk being wiped out in the short term; if instead, they decide to diversify, they are accused of a lack of focus on the core business and of putting their otherwise guaranteed sales and profits on the line…

Just like in politics, it seems that there is no valid third option…I believe however, that a third option does exist and that it would be very simple and beneficial for the rest of the market. Microsoft and Google should lower their prices (Google is free for the common user, but the companies that advertise on Google’s network, pay staggering amounts of money). This way, resources would be free for all other market players, which could then invest efficiently and produce valuable innovations, even in other sectors. For the ecosystem which gravitates around these two giants, it would make more sense to free up the money that currently goes towards dead-end investments in order to get out of this vicious cycle which halts innovation and market growth.

If Microsoft and Google are not willing to take this step, there is another, extremely simple alternative: they could distribute la majority of their profits to their shareholders, the real owners of the companies.

Jul
21
Filed Under (knowledge Management) by admin on 21-07-2009

I’ve been channel-surfing during these hot, summer evenings, trying to find something other than re-runs to watch. I noticed that Sky (Sky Italia) is broadcasting a new IBM commercial which, in some ways, addresses the things we’ve been working on for years. I was not unhappy about this because it is always a good thing when a market leader addresses topics which, just a few years back, were not well-known and makes them public for all to see.

The truth is, that for most people, “the management of non-structured data” is a behind-the-scenes activity of the “IT show”. But for those of us who work with these things, the fact that these messages are now being broadcasted helps us give our potential clients a recognizable context.

Clearly, one of the implicit goals of the commercial is to try and convince the public that only a giant like IBM can efficiently and effectively handle great quantities of information (of course, I do not think this is in any way true, so to each his own opinion :-).  What  is important, though, is the value attributed to the problem of data management (and consequently, the technology for solving this problem). Therefore, I truly hope that these kinds of commercials will also be broadcasted on other networks and not just on Sky (then, maybe for once, I won’t complain about the constant commercial interruptions :-).

Mar
31
Filed Under (Semantic Intelligence, knowledge Management) by M.Varone on 31-03-2009

The other day, I was talking to one of our clients – for whom we are designing a semantic search engine - and he made a comment which deserves some consideration.

 

According to him, it is essential that a search engine be very rapid, like Google, which gives results in two to three tenths of a second on average. In fact, he believes that his search engine should be even faster because Google filters who knows how many billions of pages, while his Intranet contains less than a million J. I tried to explain to him that speed is only one of many important aspects.

 

Like in many other fields, Google has been successful in transforming a technical (and very internal) aspect into a feature which has become important for users. Without a doubt,  speed has become essential for Google as well as for other search engines. In fact, many common searches are no longer carried out, but are “preprogrammed” by the system because this means cutting down on servers (thousands and thousands), electrical energy, bandwidth and more.

 

Establishing how important speed actually is for users is a complicated task: obviously, the less time it takes the better, but I ask myself if it wouldn’t be better to wait even 10 times longer (meaning only 2 or 3 seconds) in order to have better results.

 

In fact, based on the latest market research, 40% of Internet searches do not receive results, half of the searches have to be reworded in order to get better results and 46% of the search sessions are longer than 20 minutes. Given this situation, personally, I would be happy to wait 2 seconds longer if it means that I will find what I’m looking for more often and/or it reduces the search time of 20 minutes even by just 5 minutes.

 

Perhaps the problem lies within the fact that with the current technology, Google and other search engines do not know how to improve search results, even if they took 10 times longer. Therefore, they use a simple tactic as a backstop (a kind of unspoken agreement with the user, which is probably tolerable): I’ll give you answers quickly (and for free), but don’t expect too much quality-wise!

Oct
14
Filed Under (Semantic Intelligence, knowledge Management) by M.Varone on 14-10-2008

During the internet bubble, among the many startups based on bizarre ideas, there was one in the US working on a sound project: developing solutions able to make explicit and available the large mass of tacit knowledge hidden in email messages exchanged within organizations.

In fact, if we think about it, the email traffic we handle at work on a daily basis is definitely a goldmine, because it contains, in a processable format, the tacit knowledge which is vital to businesses. However, when we need such knowledge, we often cannot retrieve it because, being tacit, it is unstructured or unorganized, and therefore remains hidden inside the email messages.

In order to understand the full potential of tacit knowledge, we can consider the difficulties when a key person leaves a company and takes important knowledge assets with him (or her.) Or there are also the numerous times that we know that we already have a solution to a problem inside an email message, but we can’t remember where to find it.

These examples prove how much can be saved, in terms of time and costs, by an application able to read all the email messages exchanged by a group, organize the contents, and make them accessible and usable in the future.

Developing generic solutions of this kind is extremely complex (as a matter of fact, the start-up mentioned earlier is now working on other developments).

But semantics can still have a key role, even if under present conditions it requires considerable customization and tuning.

This means that only big companies can invest in such solutions, and this is pity, because small and medium businesses could also benefit from them, as tacit knowledge hidden in email messages can really imply relevant costs, often implicit.

Actually, it’s a paradox: for the first time in history we are able to keep track of the business communications that used to be only vocal, but at the same time we cannot make them accessible and usable.

I doubt the problem will ever be solved completely but I’m confident that, at least in part, it will be possible to realize solutions that can find the gems available in this goldmine of hidden and unused knowledge and in the next few years, this will be the biggest challenge for the developers of semantic technologies.

Sep
25
Filed Under (knowledge Management) by M.Varone on 25-09-2008

We all know from experience that finding something is easier when the scope is limited: with a few exceptions, this is our everyday life…

Therefore, it’s strange to notice how this seems not to be true when considering information search on the Internet (where we search among billions of pages) compared to a search on an Intranet (where we have far less pages): in fact for many people it’s easier to find information on the Internet rather than inside their computer or in a local network (be it small or big.)

If we analyse this matter in detail, we realize that actually there is no reversal of the logic; we are just comparing two situations that can only partly be compared.

The first main difference between a search on the Internet and one on an Intranet is the object of the search itself:

- on the Internet we often search for very generic data (the web site of a company or of a person, information on a certain subject) that cannot be found on an Intranet;

- on an Intranet we tend to search for specific data: information on activities connected to projects, updates on commercial situations, various documentation on sold products, on bought goods, on employed resources, etc.;

The second distinction consists in the different relevance we assign to the data we are searching on the two sources, as the assumptions and expectations are completely different: on the Internet we are content with generic information, at least some kind of indication, while when we search the data base of our business company we expect to find the right and complete answer, the one able to solve our doubt, or to match the data we already have.

Moreover, often on our Intranet we already know that a certain piece of information is actually available, but we don’t know exactly where; while in other cases we discover by mere chance a document in our database – the one document about technological application we could not find when we were looking for it.

This is why we tend to be demanding with Intranet search engines, and tolerant with Internet engines.

The third main difference concerns the quantity of the available data: it is true that an Internet search is applied to billions of pages, but it is also true that the redundancy of information is often extremely high. This factor reduces the number of “single” pages on which the search is actually applied, and therefore increases the probability to find “that generic information or that mere indication”, that in the end is not even so critical… but somehow reassures us.

To sum up, this is good demonstration that, when we talk about information, often things are not what they appear and it’s always worth trying to understand what’s behind them.