Jun
27
Filed Under (Semantic Intelligence) by M.Varone on 27-06-2008

How many useful messages do we receive via email? The analysts say about 30 per day, with a cost of 1 or 2 hours of work to manage them.

In addition to the work necessary to clear off the incoming mail, we need to consider the work necessary to retrieve the data later in time. In fact, we tend to keep everything (even only to keep a history) and in doing so it’s…. 30 messages today, 30 tomorrow…

Controlling the situation is not easy. The data keep on growing, until reading everything becomes impossible, even with a good organization system, with different folders and sub-folders, because of transversal contents, and contacts and useful details scattered everywhere -  where did I save that email from so and so with whom I’m working on that project for so and so? Maybe in the folder with his name, in that of the project, or of the customer…

The “search” function based on keywords (usually the only one available) can help us only if we know which is the right folder and if we can remember at least the author of the message and a specific word (and not too common) in the text.  But in most cases we just remember a general idea, and as a consequence we can only proceed by trial and error, often without results, or having to search on the Internet for what we already have!

Despite the complexity of the problem (information is often subjective, instead of standard and objective), semantics can greatly improve these kinds of searches. One example is the possibility to automatically double-check more data, also extending the search to all the concepts and related sub-concepts.

For example:
”I’m looking for the sales of the competitor X”
with semantics I can retrieve not only messages containing “competitor X + sales”, but also those with “competitor X + billing”, “competitor X + turnover”
and also with:
”Product1 + Competitor + sales”, “Product2 + Competitor + revenues”, etc.

 

Everything as easy for the user as a keyword search.

Jun
18
Filed Under (Myths and realities) by M.Varone on 18-06-2008

Working on categorization projects, we often face the fact that a perfect automatic categorization cannot exist:  a certain degree of subjectivity (which can also vary in time) is always involved when we assign a category or a subject to a text.

The most common situation involves taxonomies including heterogeneous categories: for example, when categorizing newspaper articles customers tend to include in the taxonomy subjects such as sport and politics together with domains such as people or events.

But while categories like sport or politics are fairly objective and strictly related to the content of the text, people and events are cross-category elements, therefore it is very difficult to manage them with an automatic system.  In fact there are no common topics, no recurring or typical concepts, no specific domains, while the only shared feature is that of being focused on someone or something (a person or event).

 

However, it is  comparatively easy for the reader to agree that articles about Leonardo da Vinci, Gorbachev, Robin Hood or Joe Dimaggio should belong to a “people category”.

In general we should always keep in mind that some choices are quite easy for us, but can be extremely complicated for a program.

For example, we may need to categorize the review of a Second World War movie. For most readers, without even having to read the whole article, the first category will be “cinema”, as the subject is a movie. The program, instead, may think* about history or war or military instead, and would not consider “cinema” as relevant topic.

Luckily, most categorization issues can actually be solved by an automatic system which, once configured properly, will be far more objective and reliable (because it will never get tired nor influenced by external factors) than a person, who remains nevertheless the only one of the two who is really intelligent.

* think… it’s only a manner of speaking :-)

 

 

Jun
09
Filed Under (Myths and realities) by M.Varone on 09-06-2008

A long series of false notions on the Internet has created a macro-myth: you can find everything online, you just need to “know how to search”.

 

Instead, there’s nothing special to know. That is, it’s not a matter of tricks if we cannot find for example, library books on the web, it is simply that library books are not on the web. In fact, only a very small part of the knowledge that surrounds us is also online, and it’s not by magic, but instead because someone has decided to make it available on the Web (and available does not mean “for free”, because it is not true that all the information on the web is free…  this is another MYTH;))

We also need to consider the impact of dynamic pages (and also if all search engines have developed a special crawler to index as much content as possible, subtracting it to the hidden part of the web), and that search engines are able to classify only a minimum part of all the accessible data (no one can indicate an exact percentage, but I would be surprised if this would be more than 4 or 5%). Therefore the content can be actually online, but the problem remains, and that is because there is no special technique to retrieve what is not indexed.

But also without considering the hidden Web: in the case that the interesting content is actually indexed, can we really find what we need in a very short time (and without effort… another mith)?

Without the right keywords, the answer is no and we could even search for an entire week and still not find anything anyway.

The reality is that we still cannot take advantage of all that we have available.

There are some that say that it would be nice to have on the Internet any original document (like the library books we were talking about at the beginning of the post) but it would also be nice to be able to utilize the multitude of secondary information that can provide a very useful support, in particular, because it is produced for the majority of people according to different competences, points of views, sensibilities, etc.

Jun
04
Filed Under (Events) by L.Scagliarini on 04-06-2008

The Semantic Technology Conference in San Jose is probably the most important in this sector.

I attended it for the second year in a row and this year the event had more than 1,100 people attending. It is a very important moment to understand the maturity level of the so called semantic technologies and in general, to evaluate if these technologies have started their run to become mainstream. Below you will find some random thoughts from a non-technical guy on trends and issues facilitating or preventing Semantic technologies from becoming mainstream.

The language used by vendors and experts is still too technical to engage and to excite business people. However, I noticed that more presentations included practical demo sessions showing how users interact with the applications or the solutions presented. This is a first step but what should happen next is to have presentations with clear ROI analysis, which was still missing from most presentations at this year’s event. I believe that, as usual, this is a turning point for any technology to show its strategic relevance for enterprises.

This year we were finally able to see the first real working semantic web applications. It was impressive to see the expectations that platforms like Twine, Freebase or Powerset have generated in the community. I am a Twine user so I am not surprised to see this interest but it is still nice to see this phenomenon. It is still early to say if these applications will be successful and drive a lot of traffic. Initial users seem to have split opinions. I have a conflict of interest because we are suppliers of Twine and the developer of www.askwiki.com which directly competes with Powerset  so I cannot express my opinion. However, we will all follow the efforts of these companies carefully because if they can deliver on the hype they have generated it will help to make Semantic Technologies pervasive.

The defense sector seems to be ahead of the enterprise and other government sectors in the adoption or at least interest in Semantic Technologies. Many of the most important defense-related system integrators, vendors or agencies attended the event. It’s difficult to say if this interest depends on the fact that the major wave of investment attributed to the defense sector allows it to have a much broader scope in monitoring new technologies or is it as I believe, due to the issues facing the defense sector (especially in monitoring open sources) that makes semantic technologies a perfect fit. In any case, this interest is of a great help to the industry.

Analysts of the major firms (like IDC and Gartner) seem not to have really caught up with the semantic wave. While most of these firms have started to cover semantic technologies in some shape or form, they don’t yet seem to be very engaged and comfortable with the topic. It came as no surprise that there were no analysts from these firms among the attendees of the event. I think it will be important for semantic technology companies to engage these firms in the future to present clearly their case if they want to find some advocates for a breakthrough in the business world.

There was a lot of talk about standards for the semantic web (OWL, RDF, etc.) as if simply having the standard makes a semantic web. People seem to forget that you need something to create applications to process the information and create output to the standards. In order to become mainstream and be really usable in real world applications, it is mandatory to have the tools to do the heavy lifting. This fact has always driven the development of our technology here at Expert System and this is why we have developed such a solid set of tools.

We believe that only when application development and customization tools are readily available, can the semantic web become a reality.