When I present a company with our software solutions (which are based on a semantic technology that uses a rich and vast semantic network), I find myself in front of an audience who clearly understands the advantages of this approach. Yet, the series of concerns and doubts they raise often clouds the decision-making process and causes an incorrect evaluation of the actual return on investment.
Whether they are raised by IT managers, KM workers or software developers, the concerns fall into two categories: the first, the costs related to the setup and maintenance of the semantic network and the second, the costs related to the infrastructure required to maintain a performance level able to satisfy operations.
There are many reasons behind these concerns, but two factors seem to stand out. On one hand, there are the excellent (and often incorrect) communication activities carried out by the makers of systems based on keyword technology. They have almost succeeded in convincing the market that a complex problem such as information management can be solved with automatic shortcuts and that any other alternative would be unaffordable. On the other hand, the majority of researchers in this sector are still skeptical about systems which are entirely semantic. This is mainly caused by their inability (at least up to now) to develop software which can combine the advantages of increased text comprehension with performance in order to meet the demands of the real world (thus further strengthening the position of the competition.)
In the past ten years, many successful projects have been developed using our semantic technology. Therefore, I think it would be useful to use real data from our everyday experiences to help clear up the misconceptions which often cause people to make irrational decisions.
Costs of development
To add a new language to Cogito, two man-years of software development and 8-10 man-years of linguistic development are needed in order to refine the semantic network. You can quickly estimate the cost of such resources (if you are in the Silicon Valley, divide your estimated total by 2!) and immediately understand that the initial investment is considerable, yet affordable considering the cost will be spread over all the implementations that will be done over time.
Cogito’s standard semantic network permits a horizontal management of content so that a significantly higher rate of precision e recall (compared to that obtained from a static system) is obtained with no need for further elaboration. For vertical implementations, start-up costs will be necessary so that a standard semantic network can be enriched with knowledge from a specific dominion (the number of added concepts usually does not exceed 5,000); usually 20-30 working days are needed for a linguist to complete this task.
For those who believe that “languages constantly change and adding new terms can be costly,” may I remind you that even the most dynamic languages, such as English, increase by no more than 100-200 new terms (of common use) and less than 1000 non-idiomatic expressions per year (in the worst case scenario, this could mean about 10 working days per year.)
Those who criticize the complexity of managing a semantic network often refer to the complexity of managing lists of entities such as: people, places, companies, organizations, etc. Traditional systems are able to recognize an entity only if it is present in a list; this aspect is often erroneously confused with semantic network management. A good semantic engine is able to recognize an entity based on the semantic role it plays within a text, therefore it does not require the creation nor the maintenance of lists. At the same time, it is also able to correctly recognize less frequent entities (which, for obvious reasons, have not been inserted in the list.)
Costs of infrastructure
Cogito can analyze more than 120KB of text (circa 40 pages of text) per second with a common single-processor server. This kind of speed, combined with its linear scalability and low cost, makes Cogito a practical solution even in situations in which large quantities (tens of millions) of documents must be analyzed.
The development and maintenance costs of a semantic network are considerably lower than what is commonly assumed; the improvements in terms of the ability to manage information (even when very complex) are obvious even to those who are not experts in this sector. I am convinced that when these aspects can be objectively analyzed (when myths and obsolete information are ignored), the number of companies which adopt real semantic solutions will increase.
Will it be possible for cogito to incorporate two languages into one as in spanglish (spanish english) or pinglish (persian english) or transliteration of one language? Would the cost of development be calculated the same as adding one language to Cogito. This is really interesting as the social networks are expanding into realm of a new bread of language, viz., hybrid languages, it would not only be interesting but also useful to see if such addition to Cogito would spark any interest in social networks such as FB. Google’s transliterate lab offers an almost flawless transliteration from phonetic persian using roman alphabetic letters to Farsi script. Using cut/paste from the GUI of transliterate would yield a non-keyboard generated script that could be placed into any text box on a form. So, my question would be: in Cogito environment, would it possible to enhance the semantic to include transliteration or a hybrid language? Many thanks for your blog! I really enjoyed it.
If we define a hybrid language as a language which follows the rules of grammar (like English, for example) and is enriched with mixed terminology, then the problem is manageable. The costs would correspond to that of the domain enhancement of the semantic network, rather than the addition of a whole other language. The situation for transliteration would be almost identical. In this case, even if we needed to map an entire language, there are existing tools which could automatically map every single concept.
Thanks for your message.