The Semantic Advantage

February 24, 2011

What’s up with Watson?

Filed under: products for semantic approach,semantic technology,Watson — Phil Murray @ 1:13 pm

The IBM/Watson Jeopardy! “challenge ” — three days of Jeopardy! matches first aired on Feb 14-16, 2011 — is a watershed event in the related worlds of Knowledge Representation (KR) and search technology. The match features IBM hardware, software, and data resources developed over seven years by a dedicated IBM team matching wits with two all-time Jeopardy! contestants. Mainstream media are playing it up, too. (Get the IBM perspective on Watson at

The result: A big win for Watson. And IBM. And potentially very big losses for those working in the fields associated with Knowledge Representation and information search.

The angst of the KR community is evident in the posts to the Ontolog forum immediately preceding and during the televised challenge. (See the forum archives at for Feb. 9, 2011 and following days.) A profession already in “We need to make a better case for our profession.” mode received a  major jolt from IBM’s tour-de-force demonstration of “human” skills on a popular game show.

Although Watson incorporates significant ideas from the KR and search communities — it was, after all, developed by experts from those communities — it’s the effectiveness of the statistical component that drives much of the uneasiness of the KR community. Watson relies heavily on such statistical search techniques as the co-occurrence of words in texts. Lots of texts.

By contrast, the KR community focuses more heavily on interpreting and representing the meaning of natural language — usually by building a model of language from the ground up: concepts assembled according syntax. The results range from simple “taxonomies” that support advanced search in organizations to very large “computer ontologies” that can respond to open-ended natural-language queries and attempt to emulate human problem-solving. But none, so far, that can lay claim to besting smart humans in a challenge most think of as uniquely human.

So major sales of new search engines in big business are going to come to a screeching halt until upper management figures out what happened. All they know now is that an IBM machine outperformed two really smart humans in the domain of common knowledge and made their current and planned investments in search technology look like losing bets. Budget-chomping losers at that.

Why Watson?

Did IBM invest substantial expertise and millions of dollars of computer hardware and software to create what one contributor to the Ontolog forum called a “toy.” Yes, it is a “toy” in the sense that it is designed to play a quiz show.

But oh what an impressive toy! And you know it’s an important toy precisely because the people who understand it best — the members of the KR community — are really queasy about it, devoting hundreds of posts — many of them very defensive — to this subject on the Ontolog forum. Ever notice how participants in live political debates get louder and interrupt more frequently when the weaknesses in their arguments are being exposed?

The good news is that these discussions have surfaced and explored the root goals and benefits of the KR field itself — often in langauge that makes those goals and benefits more accessible to the outside world than discussions on the fine points of semantic theory.

IBM’s end game, of course, is quite simple:

  1. Demonstrate that the path it took has been successful — especially relative to other solutions —  and
  2. Make the buying public aware of that success.

And what could be a more perfect audience than diehard Jeopardy! watchers — millions of college-educated viewers every night, many of whom will influence buying decisions in business and government organizations. IBM consultants won’t have to explain what they’re talking about to non-technical decision makers. The decision makers will include more than a few Jeopardy! watchers. Even better, the mainstream media has been talking about the Watson challenge for days already, often misunderstanding and exagerratng the nature of Watson’s victory.

Score a big win for IBM. A really big win.

What does Watson do?

If you haven’t watched the three-day Jeopardy! event, you can find it in several places online. Beware of sites that charge for downloads.

The DeepQA/Watson project team leader, David Ferrucci, gives a very good explanation of how it works here:

What Watson does not do

Watson is a brilliant achievement, both in terms of technology and marketing. But you need to take it all with a grain of salt. To begin with, the Jeopardy! categories chosen for this challenge have at least two significant constraints: No audio clues and no visual clues. Watson cannot “see” pictures or videos, and it responds only to electronically encoded text.

In theory, at least, those limitations could be overcome quite easily. We already have smartphone apps that will “listen” to a radio tune and tell you the name of that tune. Speech-recognition apps for smartphones and personal computers are remarkably good. Identifying the voice of a particular person seems plausible, too, if the detective shows are accurate. Facial recognition software and applications that identify other objects in static images are available now.

I’m not qualified to tell you how effective such applications are, but they seem impressive to me. And — just as Watson has extracted information from millions of texts for use during the show, there’s no reason to assume that its designers could not build structured descriptions of non-text resources prior to the show. Watson might, in fact, have a huge advantage in establishing matches with such non-text objects relative to humans … at least some of the time.

How the Jeopardy format is an advantage to Watson

The Jeopardy! format itself imposes inherent constraints — most of which are advantageous to the Watson team. And the IBM Watson team fully understands that. They just don’t talk about it too much — perhaps because what it does do is so remarkable.

  1. The Jeopardy clue team consciously limits the difficulty of each clue in several ways.
    • Some clues are harder than others, but most rely on “general knowledge.” Using its human experience, the clue team avoids clues that would be too difficult for the average smart person. Such constraints limit Watson’s advantage. Giving the value of pi to 10 places or listing all vice presidents of the US would be child’s play for Watson. When it comes to raw memory, Watson is going to win.
    • The clues rarely require analysis of complex conditions. After all, the object of the game is for humans to come up with the right question in a few seconds. The absence of more complex and subtle clues is generally an advantage for Watson.
    • The clues and questions fall within the cultural experience of Americans with a typical college education. Listing great Bollywood films would be easy for Watson but tough for most Americans. (That may change over time.)
  2. The response to most clues is a question that identifies a small set of concepts or entities — usually only one.
    • By “entity” I mean specific people, places, or things. [Who/What is] Henry VIII, Lake Nicaragua, and The Declaration of Independence are among the specific “questions” I have heard.
    • By “concept” I mean a class of things, whether concrete or abstract — like dogs, weaponry, human health, or happiness. I believe that if we took a statistical survey of Jeopardy! questions (the responses), we would find that the clue frequently consists of lists of things belonging to a class (definition by extension — a subset of the things in that class) rather than definition by intension (a set of properties that define a class). I suspect that this also favors Watson in a substantial way.

So Ken Jennings and Brad Rutter took a thumping on national television because categories that might have favored humans at this time were eliminated, and because there are other significant constraints imposed by the “rules” of the game itself. The thumping could have been worse. And IBM knew that.

So is Watson besting humans at a human skill?

In its Jeopardy! challenge, is Watson besting humans at a human skill? That’s the picture often painted in the media:

IBM trumpets Watson as a machine that can rival a human’s ability to answer questions posed in natural human language.

Source: Computer finishes off human opponents on ‘Jeopardy!’ By Jason Hanna, CNN
February 17, 2011

Well, it really depends on what you mean by “answering questions.” Sometimes you are looking for the name of a British monarch or slight changes in spelling that result in strange changes in meaning.

However, in most senses, what Watson’s designers have asked it to do is very simple when compared to what humans do when they answer questions. (See above, “How the Jeopardy format is an advantage to Watson.”)  Humans also do not ask random questions. (OK, your young children and some of your adult friends may do that, but those are different challenges.) In fact, your objective in asking a question is usually to carefully identify and frame the right question so that you improve your chances to get the answers you want … in order to address a specific problem. Unless, of course, you are a quiz-show contestant or taking a fill-in-the-blanks final history exam.

Keep in mind that, as more than one contributor to the Ontolog forum has observed, Watson doesn’t “understand” its responses. It only knows that its responses are correct when Alex Trebek says so. And, unlike in most human exchanges of meaning, it has no goals or purposes in mind, so it doesn’t know what the next question should be.

In many senses, Watson is an advanced search engine — like Google. Once you understand the nature of the game, there’s a temptation to call the Jeopardy!/Watson match a cheap parlor trick. But it wasn’t so cheap, was it? Still, brilliant work by the Watson team. Clever, too. (That’s not a criticism.) They really understood the nature of the game.

Watson got an unexpected boost from Alex Trebek, too, as Doug Foxvog noted on the Ontolog forum. My wife and I are longtime Jeopardy! watchers. It seems to us that Alex and his “clue team” have become increasingly arbitrary in their acceptance of specific answers, whether for the correct phrasing of the question or for error in facts. Some of their judgments are clearly wrong. That’s understandable. It’s the trend that irritates us, so we end up yelling at Alex. I guess we need to “get a life.”

Those are my abstract complaints. Looking at the multiple responses considered by Watson (shown on the bottom of the screen in the broadcast) gives you a gut feel for how little true “understanding” is involved. And you can be certain that the [types of] clues Watson responds to correctly are different from the types of clues humans respond to correctly. Statistically, there will be variance in the specific correct answers.

There’s more to be learned (by the general public, like me) about what actually happened by more careful analysis of the Jeopardy!/Watson challenge. But we need to let it go as a metaphor for computers outsmarting people.

Could Watson-like technology solve business problems?

Could Watson-like technology solve business problems? In some important ways, Yes. It could be customized to answer a variety of business-oriented questions with a high degree of confidence … and tell you how confident it was about the responses it provided. Applied to a narrow domain rather than the open-ended domain of common knowledge (as on Jeopardy!), Watson-like technology should have a high degree of confidence in most of its responses when retrieving information from a massive resource, and like a typical search engine, it should be able to tell you where it found those answers.

That’s truly valuable, especially when the retrieval problem is well understood. It might even qualify as a good return on investment, in spite of Peter Brown’s comment on the Ontolog forum:

That’s because “artificial intelligence” is neither. It is neither artificial – it requires massive human brainpower to lay down the main lines of the processors’ brute force attacks to any problem; and It is not intelligent – and I seriously worry that such a failed dismal experiment of the last century now re-emerges with respectable “semantic web” clothing.

Source: Posting to the Ontolog forum by Peter Brown, Re: [ontolog-forum] IBM Watson’s Final Jeopardy error “explanation”17-feb-2011, 9:27 am.

It won’t be cheap, at least initially. But that’s not the real problem. Watson team leader David Ferrucci himself brings up the medical/diagnostic possibilities. And who has the most money today, after all???!!!!

In the end, however, neither Watson nor Google nor the inevitable Watson-like imitators will do what we need most. Nor will the work of the KR community when it focuses solely on machine interpretation of natural language. Not by themselves.

Watson-like technologies also risk becoming the end itself — the beast that must be fed — just like the many current information technologies they are likely to replace. It will be a great tragedy if the KR community, the search community, and the organizations and individuals they serve assume that Watson-like approaches are the primary solution to today’s information-driven business problems.

But Watson-like technologies are an important complement to what we need most. As well as a brilliant achievement and a watershed event in technology.



  1. The following is not meant to criticize or diminish what IBM has been achieving, but to clarify some basic issues.
    The retrieval of knowledge representations from any storage system is not different from the retrieval of physical objects from a warehouse. If the pieces are to be found, they need to be stored on a surface, and a surcae is described in terms of XY coordinates. The fact that the items stored may also be classified in non topological terms, and their retrieval may follow that sequence increases the search time exponentially, if the number of items to be retrieved turns very high.
    The solution to the problem of the storage of language objects described in terms of containment (Boolean, dichomtomic) used relations in categorisaton or classification is no exception. The axioms used to explore matches between items that are physically stored in a co-ordinate system (memory locations, whether internal or external devices) are not much of a help, because the model to represent knowledge is not right. With digitalisation as a solution as it is done by a NL through using discrete units (alphabet, vocabulary, propositions, etc.) you do not seem to recognise that cognition is an ongoing process with recording and retrieving knowledge from a non-diigital form of storage media where you have probably a media composed of chemical and electric properties as observed by cognitive scientists, who still do not see how those are converted into and from a NL input/output.
    So the process is analog-digital-analog conversion when it is about processing images, giving a verbal description first, then an explanation on input, and giving an explanation and then a verbal description on output, unless you have the ways to produce images directly from verbal output. That step being skipped by Watson shows that the technology is not available for intelligent human processing despite the existing methods of conversions between various modalities.

    Comment by Ferenc Kovacs — February 24, 2011 @ 2:08 pm | Reply

  2. Thanks for the great comment, Ferenc.

    Your comment is dense with important assertions, each of which is important. We need two things: (1) a model for representing, evaluating, and integrating our respective assertions (as well as those of others) and (2) a usable — and ideally free — tool for supporting those activities and remembering those results.

    I’m working on that.

    An initial comment: I agree that, in most ways, the very smart folks in Knowledge Representation are too invested in analyzing knowledge from the perspective of discrete units — in particular, organization of concepts that reflect natural language. Our processes of understanding are dynamic and “multi-modal.” The KR folks are too tightly focused on the mechanics of assembling static ontologies from the top down.

    One more: You wrote “… giving a verbal description first, then an explanation on input, and giving an explanation and then a verbal description on output … ” describes what happens with those who translate information from one natural language to another. That’s what happens in general, whether applied to images or other inputs, IMHO. And who would know that better than you!?

    Comment by Phil Murray — February 24, 2011 @ 4:06 pm | Reply

  3. Re:(1) a model for representing, evaluating, and integrating our respective assertions (as well as those of others) and
    FK: This is called upper ontology, of which I claim to have my version LORP to be suitable for the job and which
    Re: (2) a usable — and ideally free — tool for supporting those activities and remembering those results.
    FK: is the tool wanted, although it is not available as a piece of software, but could be possible with people capable of understanding how that translates into a programming language and a piece of hardware that is not necessarily the legacy you have. But people could be trained to do a mas-scale upgrade to the new system.
    Re: I’m working on that.
    FK: We are converging, as I was expecting, also with JB.
    An initial comment: I agree that, in most ways, the very smart folks in Knowledge Representation are too invested in analyzing knowledge from the perspective of discrete units — in particular, organization of concepts that reflect natural language. Our processes of understanding are dynamic.
    FK: Yes, you are right. What they do not see is the triangle of meaning used in semantics (linguistics, not AI) where all the three component change in time and need to be synchronized, namely the physical objects or referents, the signs (the names of physical objects and concepts), and the concepts or schemes that are in the heads of individuals and are likely to be different for us each and to be in a Visual form as well that need to be verbalized in a way so that the verbal form will correspond to the same chunk of reality )referent, or object) and the name used for it (language. sign) and the people involved in the communication, which also has a context (pragmatics). Clearly, semantics and pragmatics cannot be separated.

    One more: You wrote “… giving a verbal description first, then an explanation on input, and giving an explanation and then a verbal description on output … ” describes what happens with those who translate information from one natural language to another. That’s what happens in general, whether applied to images or other inputs, IMHO. And who would know that better than you!?

    And the missing link is how this sequence DEED (copyright David Crystal, linguist) is interfaced in the middle as an image (analog) signal. And that is what neurocognitive linguistics are after, but alas, again with a wrong model.

    Comment by genezistan — February 24, 2011 @ 4:21 pm | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: