Oct
30
Filed Under (Books & News Related) by M.Varone on 30-10-2008

I would like to bring to your attention an interesting article, which confirms some of the things I’ve been writing about in the past few months (such as the post When Reality is Not What We Expected), about the differences between Internet and corporate searches. The article contains some remarkable and very realistic considerations by Google.

Oct
29
Filed Under (Semantic Intelligence) by B.Aker on 29-10-2008

There is plenty of confusion these days about what web 3.0 is, sometimes called the Semantic Web.
I thought it would help to set the record straight.

Let’s start by understanding what web 1.0 and web 2.0 are thereby setting the stage for web 3.0.

Here goes;
Web 1.0 is one producer and mass consumption. The original web gave the power of authorship to few people who created content for the rest of us. Mass consumption of that content became possible with the advent of web directories like Google and Yahoo. Those directories work by indexing every word on every page. The more words per page that match your search query the higher on the result list. But even if only one word matches the page is listed as part of the search results. This is why you get 30, 40 or 60 thousand results per search.
Many of those pages are useless to you.

 

 

 

Also, this is keyword technology. It treats words like tokens. A series of letters in a certain order from a search query is matched to a series of letters in the same order on any page. The underlying meaning of the word, the context in which the word is used, the words relationship to other words around it is not considered. Keyword technology treats words like a picture and not part of a language.
Here is the definitive test. Take any webpage. Take every word on that page and mix them up until they make no sense. Feed the page back to Google or Yahoo and have them index it. They will serve the page up just like they did the original. Same words, same tokens – if they match serve the page. Keyword technology does not care that the page makes no sense because the technology does not use sense as part of the index.

Web 2.0 is mass production and mass consumption. The advent of blogs, chat rooms, and other instant and ubiquitous authoring tools and sites ready to accept the content has been a great democratization of the web. The power to express opinion, to add knowledge to humankind is a great advance forward. We all get to hear from each other, to learn from each other. This is a good thing.

 

 

 

 

Except … that keyword technology has not helped us to locate opinion or knowledge as intended by those authors.

Consider the following sentence. “I believe the government has done a good thing in bailing out the economy that is in such bad shape”. Keyword technology would match this sentence (web page) to any of the following queries;

good government
bad government
good economy
bad economy

Four different queries and four different needs – yet they all get the same page and are left to do the work of reviewing whether the page really applies to their needs or not. Multiple that effort by the 30, 40 or 60 thousand search results and you have an untenable situation. So how are we going to get out of this mess?

Web 3.0 is mass production with pinpoint consumption. Semantics is the science of machine comprehension of text. It means the computer reads, understands and tags words, sentences, paragraphs and whole documents. With semantics, when we search we can tell the computer to fetch only concepts about “good government” or “bad economy”. In the above sentence, semantics would understand the adjective good is connected to the noun government and that the adjective bad is connected to the noun economy. In other words a semantic search would ignore a sentence such as “ I believe the government has done a bad thing in bailing out the economy that is in fundamentally good shape”.

 

 

 

 

 

So here is the kicker. If web 1.0 was single production and mass consumption then web 3.0, the semantic web, is mass production and pinpoint consumption. Web 3.0 turns web 1.0 on its head. It allows me, the individual, to find, assemble and consume only those portions of the vast internet that help me with my current task. Web 3.0 works for me rather than me having to work the web to get anything useful from it.

I had the chance to meet the inventor of the web Sir Tim Berners-Lee just the other day at MIT. It is no accident he has reinvented the web as a semantic web. As the amount of available information grows ever larger web 1.0 becomes less useful. One could argue it will eventually collapse under its own weight and it is keyword technology that is killing it. Semantics is the driver of web 3.0 and will restore the productivity promise of a world of connected information, knowledge and intelligence once more.

Oct
15
Filed Under (Events) by B.Aker on 15-10-2008

 

In a recent web seminar that we participated organized by Project 10X some 260 registered attendees submitted questions prior to the event.  I semantically processed these questions (sometimes called “eating your own dog food” – imagine that!) looking for common themes and concerns.

 

In reviewing the outcome here is what I found; 

1.         Case Studies and ROI.  People learn best with storytelling and proof points embodied by Return on Investment.  So it should be no surprise that this tops the list of questions and concerns.  These stories help convince funders, provide guidance for technical planning, and show feasibility.  Yet this also shows a level of understanding of the technology by the participants.  In other words they are convinced of the basic value parameters of semantic technologies and have come to believe they can be deployed with good outcomes within their organizations but need help to find the right place to start, the expected timelines, and how to sell the capabilities and outcomes to upper management.   At Expert System we have over 100 implementations in the last 3 years alone and can confirm this concern meets with our experience.

 

2.         Technical Integration Points.  Here attendees concerns are about how to make semantics live with or interact with existing applications, data sets, and search products.  Here I sense the need to make existing products pay a bit longer for their sunk cost and not to tear things out wholesale and start over.  The good news is that semantic technology is intended to play this exact role by providing new insight into information where ever they currently live.  9 out of 10 customers ask us for a SAAS implementation with a front end user interface that already exists.

 

3.         Semantic Networks.  This is a real surprise to us but pleasantly so.  While our technology relies heavily on a semantic network, sometimes called ontology, it is not always the case that other providers use this method to unlock the meaning of text.  Some use statistical approaches, others heuristics and still others something called latent semantic processing.  These other approaches tend to sound quite scientific but in reality are short cuts that prove to be less than sufficient for industry strength precision and recall.  Semantic Networks are hard to produce and they take time.  But the investment pays off.  They become a knowledge representation of a domain of knowledge.  When done thoroughly and properly can increase the precision and recall of the processing greatly.  Many networks are specific to a branch of science or hold deep technical knowledge representations.  Our semantic network, on the other hand, is of the common language, covering all topics, all words, all concepts and the connections between them.  This means it can be applied to any domain.

 

4.         W3C standards are confusing.  When we read the comments its clear there are too many acronyms and to many standards.  More concerning, the standards themselves seem to be the solution to semantics.  It is as if many seem to think the standards provide the inference, the storage, the modeling, the interpretation and more that are core to semantics.  The reality is that standards are only a proposed common language for describing and exchanging the outcomes of semantic processing.

 

To sum up – the semantic web has come a long way in terms of showing value and laying down a base of understanding.  But as with any new technology, there is more to do.  All of us to do better in terms of explaining, simplifying and educating up and down the organizational decision chain.  Only when that is done will we be able to say “it’s baked”.

 

Where the categories mean the following;

 

Integration:  How to embed or use semantics behind the scenes of existing applications.

Mobility:  Get semantics to support mobile workers.

ROI Case Studies: Examples of successful, killer applications and their payback.

Semantic Nets:  Semantic networks or ontologies, what they are, when to use them, how to maintain them.

Standards:  W3C’s soup of acronyms and what they mean.

Timing: How fast will the technology and/or market progress.

Performance:  Can semantics run with everything else and keep up.

Databases: How and when to use databases with semantics.

Automatic:  Do semantic systems or tools learn on their own.  What about maintenance and support.

Selling:  How to make the case for funding to upper management.

NLP: how does semantics support natural language processing or computing.

Oct
14
Filed Under (Semantic Intelligence, knowledge Management) by M.Varone on 14-10-2008

During the internet bubble, among the many startups based on bizarre ideas, there was one in the US working on a sound project: developing solutions able to make explicit and available the large mass of tacit knowledge hidden in email messages exchanged within organizations.

In fact, if we think about it, the email traffic we handle at work on a daily basis is definitely a goldmine, because it contains, in a processable format, the tacit knowledge which is vital to businesses. However, when we need such knowledge, we often cannot retrieve it because, being tacit, it is unstructured or unorganized, and therefore remains hidden inside the email messages.

In order to understand the full potential of tacit knowledge, we can consider the difficulties when a key person leaves a company and takes important knowledge assets with him (or her.) Or there are also the numerous times that we know that we already have a solution to a problem inside an email message, but we can’t remember where to find it.

These examples prove how much can be saved, in terms of time and costs, by an application able to read all the email messages exchanged by a group, organize the contents, and make them accessible and usable in the future.

Developing generic solutions of this kind is extremely complex (as a matter of fact, the start-up mentioned earlier is now working on other developments).

But semantics can still have a key role, even if under present conditions it requires considerable customization and tuning.

This means that only big companies can invest in such solutions, and this is pity, because small and medium businesses could also benefit from them, as tacit knowledge hidden in email messages can really imply relevant costs, often implicit.

Actually, it’s a paradox: for the first time in history we are able to keep track of the business communications that used to be only vocal, but at the same time we cannot make them accessible and usable.

I doubt the problem will ever be solved completely but I’m confident that, at least in part, it will be possible to realize solutions that can find the gems available in this goldmine of hidden and unused knowledge and in the next few years, this will be the biggest challenge for the developers of semantic technologies.