Mar
31
Filed Under (Semantic Intelligence, knowledge Management) by M.Varone on 31-03-2009

The other day, I was talking to one of our clients – for whom we are designing a semantic search engine - and he made a comment which deserves some consideration.

 

According to him, it is essential that a search engine be very rapid, like Google, which gives results in two to three tenths of a second on average. In fact, he believes that his search engine should be even faster because Google filters who knows how many billions of pages, while his Intranet contains less than a million J. I tried to explain to him that speed is only one of many important aspects.

 

Like in many other fields, Google has been successful in transforming a technical (and very internal) aspect into a feature which has become important for users. Without a doubt,  speed has become essential for Google as well as for other search engines. In fact, many common searches are no longer carried out, but are “preprogrammed” by the system because this means cutting down on servers (thousands and thousands), electrical energy, bandwidth and more.

 

Establishing how important speed actually is for users is a complicated task: obviously, the less time it takes the better, but I ask myself if it wouldn’t be better to wait even 10 times longer (meaning only 2 or 3 seconds) in order to have better results.

 

In fact, based on the latest market research, 40% of Internet searches do not receive results, half of the searches have to be reworded in order to get better results and 46% of the search sessions are longer than 20 minutes. Given this situation, personally, I would be happy to wait 2 seconds longer if it means that I will find what I’m looking for more often and/or it reduces the search time of 20 minutes even by just 5 minutes.

 

Perhaps the problem lies within the fact that with the current technology, Google and other search engines do not know how to improve search results, even if they took 10 times longer. Therefore, they use a simple tactic as a backstop (a kind of unspoken agreement with the user, which is probably tolerable): I’ll give you answers quickly (and for free), but don’t expect too much quality-wise!

Mar
20
Filed Under (Semantic Intelligence) by L.Scagliarini on 20-03-2009

Last Wednesday, I visited a client to discuss semantic searches. He motioned for me to sit in the chair in front of his desk. Then, right off the bat, he asked, “Can you explain why, when I search for something in Google or Yahoo, sometimes the information I’m looking for is at the top of the list and other times it’s not there at all?”

His question sparked a very interesting and lively discussion about the Semantic Web, which made me think about how much ground has been covered, but also about how much confusion still exists in regards to this subject.

The first time people began to talk about the Semantic Web was in 2001. It was a new kind of Web, in which web pages, various files, images and the like, would contain precise information about the data they contained. In this way, the Web would become Semantic: no longer a source for manually-searched documents, but rather an instrument capable of immediate and automatic data interpretation. I remember thinking, “Fantastic”. Many people still think it is just that…something that is more closely related to fantasy than reality.

The Web contains an enormous amount of information which is not always accessible. The pages that make up the Web are not “semantically” linked. The lack of explanation about content meanings and links, along with the exponential growth of the data it contains, is the main cause of fluctuation in the degree of precision of search results.

In order to give meaning to web pages, each informational resource should be able to provide information about itself (this is called “metadata”, meaning data about data). Of course, all of this information needs to be expressed in a language which is suitable for computers. To do this, the most feasible hypothesis is to use a shared vocabulary, along with some XML-based formalisms (I won’t go into the details here, further research on this subject can be done in Wikipedia). Let’s just say, that in this way, we can obtain complete, objective, accurate data and therefore, generate forms of analysis which are also exact: but who has the time to link each bit of information to the metadata? Usually, speed and simplicity are preferred (which compromise precision and efficiency).

There have been many approaches in an attempt to free the Semantic Web from labels such as, “interesting but (almost) impossible” and transform it into something “interesting but also useful and usable”. Some pioneers began to walk down the semantic road even before the theories about the Semantic Web were affirmed. For example, Semantic Intelligence aims to improve precision and recall in the search process, making computers able to automatically, “understand what we’re talking about”. If SI makes it possible to automatically understand what a text is talking about, then it is reasonable to think that metadata can be created for the Semantic Web. Today, we are way beyond the beta version frontier: Semantic Intelligence is a mature technology and is widespread in the business world.

We may not be that far away from that “fantastic” Web which is able to understand whether a jaguar is an animal or a car. A Web in which you can search for information on pop music from the Sixties and receive pages containing the keywords music, pop, and Sixties (for example), but also those about the Beatles and the Beach Boys and maybe even some useful tidbits about the next Rolling Stones tour.

The Office of Public Liaison in the new Obama is promising to listen to citizens as it considers policy direction, legislation and otherwise brings the people to Washington rather than bringing Washington to the people. The most concrete of these proposals is to allow a 5 day comment period by citizens via the internet before the President signs any legislation. Even now anyone can offer an opinion directly to the President here. You can contribute up to 500 characters. That is roughly 40 words.

The windows are open in the White House and a new breeze of open, inclusiveness is blowing right in. This is certainly a change over the previous 8 years when the White House was shut tight, the air inside growing staler by the day. But I wonder if the administration is prepared for the hurricane force winds that could result?

If you ask for comments on pending legislation how many comments will the White House get? There are some hints from around the blogosphere. Go to Technorati and ask for a count of the word “bailout” over the last 6 months. The chart below is what you get.

The peak of over 14,000 blog posts was around the passage of the first muti-billion bank bailout in the early Fall. An estimate of the average around this spike looks to be roughly 6,000 posts per day. As the debate and finalization of the ARRA (American Recovery and Reinvestment Act) and second half of the bank bailout money is finalized you can be sure the number will spike again. But let’s be conservative and assume 1/3 of the average would like to comment directly to the White House on the ARRA over the 5 day period promised. That would be 10,000 comments President Obama says he will consider before signing the legislation. The current estimate of US bloggers is 22.6 million so 10,000 comments may only be a drop in the bucket.

Short of a small army of readers how will Valerie Jarrett and her staff understand this “wisdom of the crowd” input? We do know that President Obama has hired some tech vets to lead this kind of effort.

Chief among these is a former Google product manager Katie Jacobs Stanton who will be the new President’s “director of citizen participation” come March. It is not just a coincidence that Ms. Stanton was in charge of Google Moderator.

A quick look at this tool reveals the ability for anyone to post a question (or I suppose a comment) and then have others vote for its importance relative to all the other questions posted. Looking through the questions posted around the Presidential debates is another estimate I can find that might look like what the White House will experience. The breakdown of topics, questions asked, votes recorded and citizens participating look like the table below.

Votes

Questions

People

Education

6,926

96

1,183

Health Care

3,483

81

412

Iraq War

3,513

64

488

Economy

7,534

209

580

Environment

3,078

73

317

Foreign Policy

3,699

101

339

TOTAL

28,233

624

3,319

Ok here is the rub. No matter how you count what can be expected from citizens participating in the new administration technology beyond posting and voting is going to be needed. It’s not clear on Google Monitor if the categories were decided before the questions came in or after everyone posted. In any case I took the top vote getting comments from each of these categories and analyzed them again using our semantic technology to see what categories come out. I could find 90 categories in total across all those who commented. The top categories (more than 1% of the total) were the following;

That’s easily more than twice what Google Moderator can bucket things into. The point is that true participation means more than a simple tally. It should mean listening, really listening to the context, the nuances, and the breadth of what citizen’s experience in their daily lives and what they expect from their government. Volume is only the first problem for citizen participation. The bigger issue is, as the intelligence community who is familiar with these problems puts it, finding dots, connecting dots and understanding dots.

I believe semantics to be a core technology that can not only process the volume of what the White House is about to experience but can also trick out the full picture of true citizen participation. It will not do President Obama any good promise to listen to his most important constituency and latter be accused of lending a dull ear to the process. There is great promise in having the breadth and range of American opinion directly influence the highest office in the land. Everyone can see technology is the key to extending our democratic reach to every living room and kitchen table in the land. The peril is in not applying enough or the right technology resulting in enough citizens feeling as though they were not sufficiently heard. That would do democracy harm indeed.

The impact of the Internet and then of the online social network phenomenon on the consumer buying behavior is a fact. These days, I cannot even imagine organizing a vacation or buying a piece of electronics (not to mention books, cars, real estate etc.) without first spending a significant amount of time reviewing online opinions from my peers, consumers or bloggers with recognized authority on the topic of interest. You can therefore imagine that monitoring and, when possible, trying to influence the opinion expressed on these sources should be a main priority for any company (at least in many sectors.) So it is not surprising that the first comment I receive from the majority of marketing and product managers I speak to is, “Yes of course we know it is important and we are doing it.” However, if you try to understand what most of these companies are doing in reality, you will find out that the situation is quite different.

First of all, the budget allocated to the monitoring of online sources is a small fraction of the budget allocated to traditional business and marketing intelligence projects. Companies are spending significantly more money on tools to analyze internal structured data (sales, accounting, inventory etc.) and even more troubling, sentiment and online monitoring account for a very small fraction of the budget invested in traditional market research (that is, a set of pre-defined questions where the consumer has to choose one answer and, sometimes, gets to add a few words in free text.)
To clarify, companies worldwide prefer to use information that is
  • expensive to gather (projects are very often in the range of hundred of thousands of dollars);
  • biased (there is a lot of research proving that users do not really answer freely to these questions);
  • static, or in other words, that describes the situation at a specific moment in time and that is actually compiled and reported sometime later when the situation could be different.

In any case, the point I want to make is not that traditional market research is useless. I think it has a right place in the mix of competitive intelligence initiatives any company has to undertake. But more so, that it needs to be integrated to take advantage of the wealth of information the explosion of the Internet has made available. Compared to traditional market research, online sentiment monitoring has the following advantages: 

  • it’s relatively inexpensive (if done with technologies);
  • it’s less biased (and this bias tends to decrease as more of the masses go online);
  • it provides a dynamic, real time view on the market.

This established behavior is very resistant to change. When I introduce our product, Cogito Monitor, to decision makers inside enterprises and mid-size companies, I often get the same objections. They immediately focus all their attention on finding errors and noise in the sentiment level automatically identified by the system. Even if the product has proved in many implementations to provide very high precision and that noise has no impact whatsoever on the reliability of the summary data provided (false positive instances are equally distributed among the different sentiment levels.)

I could argue that traditional business intelligence and market research projects offer probably similar results in terms of reliability and I am not saying that our product is perfect, but what I really want to question is the rationality of their objection. To what are they comparing the results obtained by Cogito Monitor? If the mistakes, as they are, are statistically irrelevant why are they resistant to use also this information, in conjunction with any other information they already have to support their decision-making process? To what are they comparing the precision of the online monitoring tool? Instead of comparing it to what they actually have today, it seems like they compare it to an ideal system or process providing 100% precision and recall. And when they resist to adopt these tools, they actually choose to sit like they are George W. Bush on 10 September 2001, and prefer to rely on data they are comfortable with but that is incomplete in describing what is actually happening in the market place, when they should instead be investing in resources able to interpret the signals, often still weak and confused, of brewing storms that are available on social media which can dramatically impact their business.

Mar
02
Filed Under (Events) by L.Scagliarini on 02-03-2009
The Girona airport, from where I will fly back to Italy, is surrounded by cranes, new parking lots and maybe, new terminals in construction. The summer-like weather, despite that it is only February, makes you feel like, at least this part of Spain, is a small Doha. It’s not immediately clear to tourists and business people traveling like me, who choose this airport for budget reasons (note for VC interested in investing in Expert System: yes, we are budget sensitive), whether the constructions had started during the economic boom and then slowed down, or the new infrastructures were already planned to be ready after what is now known as the Great Depression II. It doesn’t matter. To those who see the glass half-full, they are still icons of a better future to come. Images that cheer up even someone who has just attended the Mobile World Congress of Barcelona, and heard for three days complaints like “it was better last year”, “the Nokia party two years ago was something different”, “at this point the sector is mature”.
It was my first experience at the Congress, so I can’t make any comparisons. But my impression was of a lively event, with significant investments made by manufacturers, and, as usual, a frantic attention of the media. Being more familiar with events dedicated to innovative software, I immediately noticed that the crowd was made up by a large majority of men in dark suits and ties, that many booths had hostesses like it was still the ‘80s fair stands, and by the fact that the ghost of Apple dominated the scene. The ghost showed up on the desktop of the new smartphones launched by the competitors  (in the form of iPhone-like small squares lined up in rows and inevitably, activated with a touch of the screen), in the names chosen by the manufacturers and operators to launch new services (as reported by TechCrunch), and as hope for the many start-ups that had developed potentially revolutionary applications but had clashed with a market still controlled by the mobile operators and therefore slowing down the adoption of the new applications, practically preventing the possibility to gather quick success and, if needed, more simply, a fast feedback from the market.
As Thomas Clayton, CEO of Bubble Motion, explained to the crowd attending the Business Services seminar, the market for mobile applications is far more complex than the Internet application market. “We need to develop our applications for different operative systems, whose number is constantly increasing (iPhone, Android, etc.), but, at the same time, we don’t have the opportunity, which for example Facebook or LinkedIn have, to have hundreds of small releases every year, that are very useful to evaluate, in real time, the features users appreciate most. All this prevents innovation and makes it difficult for new companies like ours, to prosper.”
I am new to the mobile sector. As a consumer or business user, I am happy with my Blackberry, and I rarely use the photo camera. I started to follow this sector more carefully in the last 18 months, since we, at Expert System, realized that the quality provided by an effective natural language search engine like ours, in particular its precision or the capability to extract accurately only the relevant information, could stand out more on the screen of a mobile phone, than on the monitor of a computer.  Nevertheless, it seems clear to me that we are at a turning point in the market: with the opening of the wave of new application stores like Nokia’s and Microsoft’s, mirrored to Apple’s (thank you Steve), innovation will be boosted, and the control of operators will inevitably diminish.
During my three days in Barcelona, I spent a lot of time visiting the booths of companies, very often no more than start-ups developing new applications for mobile advertising. Like many others, I believe this market will increase dramatically in the next few years. Yet their success is not to be taken for granted and it will depend on their capability to effectively consider the most significant difference between a smartphone and a laptop. The relationship between a consumer and a smartphone is more intimate compared to the one with a laptop. I realize this fact very often, especially during my much-too-frequent business trips. While my laptop is switched off at some point, perhaps late at night, my BlackBerry is always there, next to me, and when the red light flashes, the temptation to check who’s writing, even right before going to sleep, is very strong. You could argue that this is actually an addiction problem. True, but as it is a common situation, understanding the effect of this sad personal condition is relevant to analyzing the critical factors for the success of mobile advertising:-)
When a user surfs the web on a computer, he or she is more disposed to accept and be reached by irrelevant ads. Obviously, in the next months also Internet advertising will have to move to systems able to improve their relevancy (see our Cogito Semantic Advertiser), to make advertising truly effective. Yet, the failure of the present systems in providing appropriate messages is more tolerable on computers than on smartphones. When using these devices, the user’s patience has a lower limit: the space on the screen is limited, therefore the presence of useless information is more irritating. In addition, as the device is more personal, the user is less disposed in general to be reached by irrelevant messages. However, this is also why the mobile phone can become the holy grail of direct marketing. But in order to become relevant and therefore successful, mobile advertising will have to correctly consider two variables. The first is the easiest to understand, and in fact it is already monopolizing the discussion on the subject: I’m talking about geographic reference. The second, more difficult, is the ability to profile effectively the user in any moment in time. Today this aspect is considered only in relation to the demographic data gathered by the operator. But demographic data alone are not enough: this second variable is crucial. For example, knowing that Simon is 45 years old, lives in London and earns 60K pounds per year is not enough to identify the kind of advertising information to be sent to him today. The dynamic aspects of the profile are far more relevant. For example as it is important to know that today Simon is in Berlin, it is also important (and maybe even more) to know that lately Simon is interested in modern furnishing, because he constantly accesses contents of this kind and often shares information on this subject. Semantic technology can be extremely useful especially for this second dimension of user profiling, because it can obtain relevant results in real time. And this is why I believe that semantics will surely be pervasive for on-line and mobile advertising in the future.
Another aspect of my attendance to the Barcelona event is that I’m returning to Italy with a Global Mobile Award in my suitcase. Cogito Answers won it for best technology for customer care. I believe this award, unexpected since we are new in the sector, proves that semantic technologies can also become extremely important in the development of a wide range of mobile applications. During my acceptance speech (I like to think I stole the audience away from Kevin Spacey, who was speaking  in the next room), I highlighted how the semantic technology applications for customer care are only the tip of the iceberg of the value that this technology can offer to companies. In fact, I believe that semantics will be fundamental to enable the development of applications for smartphones. Semantics makes, in general, the access to information easier and more effective (from tourist guides and user’s manuals, to customers and sales prospects data, from inventory data to reviews of the latest U2 or James Morrison CDs.) Thus, it helps employees and customers to receive precisely only relevant information at the right moment, and above all, enables new ways of interaction between companies and customers.
The success at the Global Mobile Awards proves that Expert System is determined to play a relevant role in this upcoming revolution. It would be nice to fly back to GMA in a couple of years and see less grey suits. It would be a sign of things changing, just as the CEO of a start-up said to me: “Maybe the place will still be Barcelona, but with less grey suits, or maybe the men in grey suits will remain here, while those making real innovation will meet in some other beautiful town, coming from any side of the world.”
And, I would add… that people will no longer ask in which country a winning technology was developed: good technology can be developed everywhere, the “world is flat” now, as we all know.