Last week I attended EDW, one of the most relevant educational events for data management. The conference was very well attended and it was a great opportunity for me to understand whether unstructured information has finally reached the level of importance in supporting the strategic decision-making process as reported by industry analysts and the press.
If I have to judge from the very limited understanding of unstructured information displayed by the data scientists, data architects and data consultants attending the presentations, I would have to disagree with Gartner and IDC. However, it is true that even compared to last year, more and more presenters seemed to pay more attention to the unstructured portion of information, and some of them presented clear cases supporting this focus. So, if we all aren’t completely on the same page yet, at least we’re getting close.
For technical people who have spent their entire careers looking at data in rows and columns, it may be difficult for them to develop an understanding or appreciation of the value of messy unstructured data. From an organizational point of view, it’s likely that data teams will expand to include crazy information analysts.
In order for this to happen, organizations only have to look as far as the business value of unstructured information. The good news is that, compared to raw data, unstructured information has the advantage of showing its business value quite clearly. For brands, the impact of a negative customer experience shared on Facebook, a leaked personal email or even off-the-record comment, or the rediscovery of valuable research buried inside an unnamed folder, cannot be questioned. I believe that the time for the world of data modeling to include unstructured information is already here.
Knowledge intensive sectors like publishing, oil and gas and finance are already using “soft” unstructured data for a variety of purposes: to maximize the value of their assets (publishing), real-time monitoring of supply chains against unexpected events that could create significant economic harm (oil and gas), including valuable contextual information in predictive models (finance). The advantage of these proof cases is that, once explained, they are very easy to understand. In the coming years, we’ll hear more and more real world examples highlighting the importance of including analysis from unstructured information and data extracted from internal or external information streams.
So, even if some of the questions I heard at EDW from intelligent, serious and prepared data scientists are still good material for laughs over beers with friends, I think that “the times they are a-changin”. Those who have begun to invest in semantics—the best way to ensure a smooth integration between the two worlds, btw—will laugh for a long time and all the way to the bank.
After months of hard work by our product engineering and marketing teams, today we are proud to announce the launch of Cogito Intelligence API. With this new product we make the experience and knowledge we’ve acquired over the years easily accessible for partners, developers, intelligence agencies and corporate security departments to help them develop and implement more effective platforms to monitor streams of information.
The API features a dedicated knowledge domain for crime and intelligence that is regularly updated to reflect the real-world developments impacting the sector, allowing our customers to take advantage of this rich resource to enhance their existing analysis platforms, or use it to support projects or build new applications.
We’ve developed an easy way for testing the power of our solution, and invite you to give it a test drive via the Live Demo at www.intelligenceapi.com, and to explore the developer’s tab for more information.
Cogito Intelligence API is a significant step forward in our strategy of offering vertically customized semantic webservices to enable search, business intelligence or other applications to take full advantage of our unique and rich text analysis. I welcome everyone to test the webservice and engage with us to help to make it the standard for information intelligence.
I have often said that companies are missing out on the real value of social media analysis. More often than not, even the big players don’t have the processes or models in place to really make use of the data gained from the analysis. As a result, social media analysis has a limited impact on the business, not to mention the budgets assigned to such projects.
I recently met with the head of customer experience at a well-known bank to discuss tools they would need to support social media analysis. I was prepared to give my usual pitch about how “sentiment analysis is useless but you should still mine social media” when this manager stopped me. Imagine my surprise when she asked if we could jump ahead to the part about what they really needed from us. The slide looked very much like this:
It’s nice to know that things are changing—and it’s a bonus to experience that change firsthand rather than reading about it on a blog or in an analyst report. This was the first time that a discussion around social media analysis tools was put into a broad, clearly defined perspective. The surprises didn’t stop there. Our customer was able to make very specific examples of the quantitative AND qualitative data she wanted to extract from these streams.
This was a natural and logical entry point for semantics—and made it easy to explain how semantics can bring value to their company. We focused on some of the core capabilities where semantics really distinguishes itself, especially how:
I don’t think this will be the last time I make my ‘usual pitch’. Sidetracked by the buzzwords and confusing marketing messages, many organizations are still not clear on the strategic value of social media projects. But change is afoot. As in the example above, the clearer the strategic view, the easier it is to turn raw precision and recall data into business value. The trend toward a clearer understanding of the strategic value of social media data (and I hope it IS a trend), and the enhancement of predictive modeling present a unique opportunity for real semantic technology vendors… and we are here to take advantage of this trend.
My role in business development at Expert System often puts me in front of many diverse audiences. Last month for example, I presented to or interacted with not only companies and potential customers, but also consulting firms and post-graduate students at local universities.
The contrasts I’ve noticed between the analog world and online world that I am exposed to daily make me think that, even in this ever-connected Big Data age, much of the abundance of available (i.e. free) information and knowledge is being ignored.
Take for example the case of my recent presentation to a masters-level class at a prestigious local university. When asked, the majority of students responded that they were currently involved, in one capacity or another, in launching a start-up. But it became clear to me over the course of the day that their collective knowledge of the technical and practical knowledge needed to make the leap from idea to reality was approximate at best.
Remarkably, I see some of the same gaps in the consulting world. These firms—small, large and in between—are all concentrated in various aspects of data management, from traditional BI, to tackling large data sets (i.e., Big Data), and offer these services to Fortune 1000 enterprises and up. When it came to talking about today’s problems, I had the impression that they were reading off the same presentation deck they used 20 years ago! Their interpretation felt stale and out of touch, which is shocking given the wealth of daily information and conversation that we all have access to online.
For today’s start-ups, the internet is literally brimming with free resources in the form of blogs, videos, podcasts, shared presentations and more that give anyone who is interested access to the deep knowledge behind the process of everything from how to code, to how to raise capital and attract investors. Sites like Fred Wilson’s avc.com, bothsidesofthetable.com, feld.com and cdixon.org offer a veritable encyclopedia of information and experience to potential entrepreneurs.
This clash of perspectives has been a big eye opener. Online and social media are filled with stories chronicling start-up successes (and to be fair, also some of the biggest failures)—clearly there are many entrepreneurs (and consultants) who are getting it right. But my personal experience of late paints a picture I can’t ignore. So, with that in mind, I thought to share a short list (think of it as a 101-guide) of the systems I’ve put in place to stay informed:
With this strategy, I can keep up to date on all my areas of interest (without getting lost or distracted—most of the time) with a maximum commitment of 45 minutes a day. More importantly, it has also helped me connect and build invaluable relationships—all this not from Silicon Valley or New York City, but from a remote location in the Bologna hills where the food is better, or at least cheaper, than in Palo Alto, Milan or London.
Last week, the Microsoft masses descended upon Las Vegas for the annual SharePoint Conference. Pre-show estimates put attendance at 10,000, and from the looks of the packed sessions and general crowds it was at least that many if not more.
From my vantage point, some of the key challenges revolve around data integration, findability and migration. Microsoft has tried to emphasize new features in the enterprise search user experience and improve the backend scalability and performance by leveraging their acquisition of FAST. However, it all boils down to this: if I still can’t find what I’m looking for, better chrome delivering less than meaningful results doesn’t help me much.
Many of the opportunities around the platform are likely the result of the Microsoft’s product-centric view of the world. SharePoint is very feature rich, but requires a thoughtful approach to how people will be using it. People are the key factor in the equation. People use software to find, analyze and consume data. I saw many sessions that highlighted new slick features referencing users, but ignoring the information they ultimately were trying to access.
Data discovery using semantics and categorization offers a great solution to bridge the gap between untapped and rich data with people that are eager to find it. At the end of the day, most SharePoint users are goal focused – how do I get the information I am looking for to complete the job at hand?
Migration of existing data sources, unlocking new data and presenting the data in familiar way through semantic tagging and logical categorization seems like an obvious solution. It’s also important to note the world does not only revolve around SharePoint. A daily workflow that exposes users to content extracted from the web and social media in a common dashboard seems to be a logical progression forward.
Semantic technology can play a key role in aggregating data in a usable way to make industry applications even more powerful and easy to use. I expect to see more of these implementations to be showcased next year.
Like most U.S. voters, I was planning to spend last night waiting for the results of the presidential election. From Europe, the wait is even longer, as the first results start coming in around dawn. So while I was watching the countdown to the closing of the first poll, I started to wonder whether waiting four or five more hours was a good investment of my time (vs. actually sleeping).
At that time, I had just received a message from a friend of mine forwarding a quote from Obama stating that he thought he had enough votes to win, while at the same moment, David Plouff was speaking openly about his confidence for Pennsylvania, Ohio and Virginia remaining blue states. This open confidence was obviously part of the last minutes of this never ending campaign, but it also was a very important signal that the statistical models available to them were showing very little uncertainty compared to what “we” were feeling.
While still debating whether it was time to go to bed, I went to the NYT on my iPad and I read this post from the now famous FiveThirtyEight blog. As typical of his very well written blog, Nate Silver describes the data available, helps to explain and interpret the data and, basically, tells us that in the end, the election is not as undecided as we all thought. At that point I made up my mind and went upstairs.
I think that in addition to President Obama, the 2012 election had another big winner: “Big Data”. Polls have been extremely precise throughout the campaign to identify mood swings, trends, etc. and, at the end of the day, they were next to perfect in predicting the outcome. And this is only the beginning.
As I wrote in a previous post, I am very confident in how the integration of unstructured information will improve the quality of human behavior prediction models, and I am very excited by the effect that unstructured information will have in terms of the costs of feeding data to these systems. This will mean less doubt about who the next president will be come this time four years from now, and greater ability to predict, with significant precision, the success of a product in the market with very limited investment, for example. This is to me is almost as exciting as waking up to the news of who I already knew was going to be the president for the next four years!
With the U.S. presidential election looming, it’s hard to avoid the talk of who’s ahead—everywhere you turn, there’s an article with the latest results of a new poll. Over the last 24 hours, I read two articles about predicting human behavior. David Brooks, poking fun at his ‘poll addiction’, supports the thesis that, while you can reach a certain level of predictability, essentially, human behavior is impossible to predict.
On the other end of the spectrum, I clicked over to an article that makes the case that what has been missing in building predictive models is the data. Now, the data is available in the form of social media content and will progressively more available in the future. Problem solved!
When we talk about models for predicting human behavior, I think we have to avoid the radical approach. As the political system demonstrates, we have made huge progress in predicting behaviors and reactions of the electorate, where elections are often won by a small margin, or even hanging chads in some cases.
But the objective cannot be perfection. We do not expect this from most of other models—we accept a margin of error. I believe that when we start including new data based on unstructured information, the margin of error in human behavior predictive models will not be eliminated completely, but it will shrink.
We will still have to wait for election night to know who the next president will be, but we will probably send out the party invitation the night before.
I admit, I am a weather geek. As a boy, I was fascinated by the weather. By age 11, I was recording the daily weather in a small diary. Back then, predicting the weather seemed to be more about luck than science, based on personal observations and the Farmer’s Almanac.
But even if tools for weather prediction were not yet perfect, there were still many tools in place to enable monitoring and measuring. [In fact, I can’t write a post about weather without mentioning that Italians invented some of the most important weather monitoring tools: an early thermometer (Galileo Galilei in 1592) and the barometer (Evangelista Torricelli, 1643).]
The data monitoring and measuring would eventually allow scientists to create the models that now enable the weather forecasting tools in use today. And while predicting the weather is obviously still not perfect, we are much closer than seemed possible 40 years ago.
I can’t help but see the similarities between this and the business world. Companies have been collecting internal data for years now, and most are quite good at explaining what happened to them. Today, with the advance of tools and technologies able to capture what was once uncollectable, companies are getting better at using it for decision making and forecasting. Here, I am obviously talking about unstructured information. And while companies have rich information sets available to them, I believe that there is still a lot of information left on the table, unused.
Today information can be collected because there are repositories available, (i.e. CRM systems, social media, market research verabatim, etc.), and, thanks to semantic technologies it can be understood and structured in a format that may be used to support innovative predictive models.
Using the example of internet content, all of the things embedded in text—topics, tone, style, relationships between concepts, etc.—as attributes/variables that can be associated to each piece of text to describe a phenomenon. A predictive model can take all of this information into consideration in a number of ways that can have a huge bottom line impact. For example understanding if a new product can be successful in order to understand how to avoid churn in your customer base, or even more simply whether it is time to hire a new CEO, are complex matters, but the return on the investment could be huge for whomever will make progress in developing effective systems.
The good thing is, you don’t need a major capital investment to put such systems in place. And, if you are successful, this model can be a significant competitive advantage that can help pave the road to success.
Today, we really do have a level playing field, more than any other time before. And everyone has the opportunity to be the next Apple (or a weather geek, perfecting the art of always knowing what to pack for the next trip).
Although it’s been hard to resist reading the news about the previous night’s debates each morning, I have been relying on our analysis of the presidential debates for a first impression. Like with email communication, experiencing the debates minus any visual or verbal context (and not even in sentence form initially) can leave much up to interpretation, and we’re left with word choices, and the meanings of those words (which can imply feelings and context) to figure it out.
While much of the analysis here is straightforward, one interesting aspect of using semantic analysis is that it is able to distinguish the most important sentences and words in text, determined not by frequency, but by a complex algorithm that looks at the logical role, co-occurrences with related terms, etc. Using this, we can identify the most important words and concepts that are being conveyed, those that are central and critical to the overall text (see “Most Important Nouns” in the graphic below).
Some of the most interesting discoveries were the use of “Romney,” which was cited as the most important term used by President Obama in last night’s debate (Could this mean that he took a more aggressive and forceful approach to his opponent this time?), and Romney’s use of “I” over “we” (Obama was fairly equal in use of both).
Take a look at our newest infographic to see some of the other ‘curiosities’ that our analysis uncovered. Until the next debate……