Sep
25
Filed Under (knowledge Management) by M.Varone on 25-09-2008

We all know from experience that finding something is easier when the scope is limited: with a few exceptions, this is our everyday life…

Therefore, it’s strange to notice how this seems not to be true when considering information search on the Internet (where we search among billions of pages) compared to a search on an Intranet (where we have far less pages): in fact for many people it’s easier to find information on the Internet rather than inside their computer or in a local network (be it small or big.)

If we analyse this matter in detail, we realize that actually there is no reversal of the logic; we are just comparing two situations that can only partly be compared.

The first main difference between a search on the Internet and one on an Intranet is the object of the search itself:

- on the Internet we often search for very generic data (the web site of a company or of a person, information on a certain subject) that cannot be found on an Intranet;

- on an Intranet we tend to search for specific data: information on activities connected to projects, updates on commercial situations, various documentation on sold products, on bought goods, on employed resources, etc.;

The second distinction consists in the different relevance we assign to the data we are searching on the two sources, as the assumptions and expectations are completely different: on the Internet we are content with generic information, at least some kind of indication, while when we search the data base of our business company we expect to find the right and complete answer, the one able to solve our doubt, or to match the data we already have.

Moreover, often on our Intranet we already know that a certain piece of information is actually available, but we don’t know exactly where; while in other cases we discover by mere chance a document in our database – the one document about technological application we could not find when we were looking for it.

This is why we tend to be demanding with Intranet search engines, and tolerant with Internet engines.

The third main difference concerns the quantity of the available data: it is true that an Internet search is applied to billions of pages, but it is also true that the redundancy of information is often extremely high. This factor reduces the number of “single” pages on which the search is actually applied, and therefore increases the probability to find “that generic information or that mere indication”, that in the end is not even so critical… but somehow reassures us.

To sum up, this is good demonstration that, when we talk about information, often things are not what they appear and it’s always worth trying to understand what’s behind them.

Sep
12
Filed Under (Glossary) by M.Varone on 12-09-2008

What’s natural language?

I am often asked this question by customers.
Natural language is the everyday language (English, Italian, German, etc.) used to communicate at all levels, which in computational linguistics is opposed to “formal language”, created expressly for a specific purpose. While natural language changes and evolves continuously thanks to neologisms, idioms, loanwords, slang, etc., formal language is closed and without exceptions: no semantic ambiguity, no omography, no homonimy, limited expressiveness.

Sep
04

The secret to a successful automatic categorization project is not dependent on choosing a powerful enough technology, but rather in the methodology used to implement the project: if the methodology is the right one, then a powerful technology will be actually indispensable to obtain success, but if the methodology is wrong, no technology will be able to help.

The most important element is the initial analysis phase, during which it is necessary to describe the core of the issue in a clear, objective and replicable way.

 It is fundamental that the customer, typically an organization that needs to manage a considerable amount of knowledge (in general, various types of documents produced or acquired during work), explains to the supplier its real needs.

The supplier, of course, must commit to address such needs in the best possible way.

Described in this way, this situation is not so different from any other software development project. However, the difference in this case is that we must understand how to manage a complex knowledge, which is never easy, and cannot be improvised.

The first step is the most important one and it requires a special effort from the customer who needs to answer rationally the following questions:

  • Why do I need to categorize my documents?

  • Who are the people who best know the knowledge base I want to categorize?

  • If currently the categorization is performed manually, what is the detailed process flow?

  • Which are the really important and relevant categories able to make the content more useful and valuable?

  • If the category tree is already available, are all the categories really necessary?

  • Which are the most objective criteria that make a specific document belong to one category and not to another?

Even if the above questions are all simple ones, it is not so easy to quickly find the answers and this is where the experience of the supplier comes in, making the analysis phase a cooperation between customer and supplier.

First of all, the supplier needs to discuss the problem with the customer before offering a solution. Moreover, the expertise of the supplier must exceed the technical aspects strictly related to technology: in fact, the customer is generally not a knowledge expert and therefore, it is not easy for him to immediately individuate the basic categories (or knowledge domains) for the success of the project.

If the analysis phase is performed correctly, the most important step for the success of the project is already done: this is in fact, the only narrow path that leads to an effective system able to guarantee effectiveness and advantages in terms of costs and value.