Watson: What IBM Won’t Tell You

Many years ago I was on a team tasked with the installation and set-up of IBM cutting-edge speech recognition technology for a research hospital. After we finished we decided to run some tests.

My colleague grabbed the microphone and said “penis degeneration acyclovir sildenafil citrate”. The system dutifully printed the words on the screen, exactly as he said them.

“What was that?” I asked him.

“I dunno, but I hear doctors say it all the time” he replied with a big grin, as if he was imagining himself putting on a white coat and demanding ten times the salary he was currently earning.

“We need to really test it” I said, not willing to believe the system was as clever as the IBM sales reps were trying to make everyone believe. They said the usual stuff about the biggest dictionary ever and the fastest processor on earth…able to decipher any accent in a single bound, yada, yada.

I pulled the microphone away and said “What’s for lunch?”

The screen was blank, the disk access lights were blinking, and then the screen typed “lorazepam soma heartburn”.

“Not what I was expecting but this thing might actually be genius” I said as my colleagues rolled on the floor laughing at it.

I tried again by saying “Make me a peanut butter and jelly sandwich!” It delayed, delayed some more and then printed “vitrectomy paroxetine aloe vera detoxification ciprofloxacin.”

“Yup, perfect. Looks ready for production” I said as we all giggled our way to the project manager’s office.

That’s the story that comes to mind when I read the exciting news of IBM’s latest artificial intelligence project, Watson. But before I get ahead of myself, take a look at the obvious and ironic flaws in IBM’s marketing strategy.

Financial services is the “next big one for us,” said Manoj Saxena, the man responsible for finding Watson work. IBM is confident that with a little training, the quiz-show star that can read and understand 200 million pages in three seconds can make money for IBM by helping financial firms identify risks, rewards and customer wants mere human experts may overlook.

Maybe it’s just me but why is some guy hired by IBM to tell Watson where to work? It’s the smart one, right? Ask Watson whether financial services is the “next big one” and see what this Jeopardy-winning machine says. My guess is it will pop out something like “Defence Industry for $1 trillion, Alex”.

The humans of IBM seem to say they choose target markets for Watson based on how much was spent on technology in the past. Isn’t that exactly the kind of analysis Watson was made for?

Banks spent about $400 billion on information technology last year, said Michael Versace, head of risk research at International Data Corp’s Financial Insights, which has done research for IBM.

I find it also ironic that the name of the guy tracking where the money goes is Versace. What if Watson was focused on risk in the retail industry? A lot of money can be made from predicting who will continue to buy Versace.

But seriously, banks took a massive hit in the 2008 crisis and have been cutting back budgets. Spending $400 billion on IT is a single data point not a trend and it certainly is not a prediction. Yet IBM doesn’t seem worried.

Watson “can give an edge” in finance, said Stephen Baker, author of books The Numerati and Final Jeopardy, a Watson biography. “It can go through newspaper articles, documents, SEC filings, and try to make some sense out of them, put them into a context banks are interested in, like risk.”

Perhaps the real question they should be asking Watson is whether it can predict or find the junk, toxic loans and Bernie Madoff schemes. But that is the part of the story IBM is probably not happy to discuss. While they will always tell you it can outrun human processing they tend not to talk about the dark side of that equation — it can make mistakes faster than ever before and might not be able to recognise when it has based itself on humans’ faulty logic.

I tried to highlight this quandary in my BSidesLV presentation last year “2011: A Cloud Odyssey“. Aside from simple error in automation, automation of human thought can also mean accelerating the wrong decision or having too much confidence in a decision. HAL killed the crew of his ship when he (mistakenly) thought they would jeopardise the mission.

Watson isn’t as powerful as HAL, of course, and probably will be managed better. If you think of it as a CPU, a fast tool, the level of risk seems far more reasonable. Unfortunately we humans always are tempted to personalise computers and see them as thinking, sensing machines…only to realise too late our deepest questions will be answered with nonsense (e.g. “42“).

A first person account of Watson that was sent to me seems far more reasonable in perspective and explanation of limitations than the predictably glowing marketing statements from IBM.

It fails mostly when leaps of creative thinking are required. The kind of thing humans can do quickly and computers can’t. Otherwise it searches (and seems to add to) its knowledge base much as humans do, only way way faster. Perfect (almost) for Jeopardy. Or for assisting with diagnosing medical conditions.

That sounds like something Watson might agree with.

Watson

Update: a decade later in 2022, Watson has been scrapped for trying to kill patients.

Penguin fights Sea Lion

The Discovery Channel has posted some teaser videos of Gentoo penguins from a program about to air. I see only 150 or so views so far but that surely will change. The imagery is fantastic:

…portrait of our earth’s polar regions. Frozen Planet premieres on Discovery Channel on Sunday, March 18, at 8PM e/p

They use a simple formula to help humans see the beauty of nature. First they frame a scary predator lurking nearby:

Then they set-up an underdog scene. Go penguin go! Fly away! It’s gaining on you…

I won’t spoil the outcome but suffice it to say that the program talks about swimming and running instead of how fast penguins can fly when they are in the water. They stay in the human/outsider perspective and emphasise mostly what can be seen in the air.

Here’s the clip:

A similar clip from the program tries to frame penguins in terms of fraud. It’s an amusing story but as long as they stay in the outsider mindset it could have been even better if they had made references to the stock market, or at least the real estate crisis.

Live Global Ship Positioning

I couldn’t think of a better title. It’s a tongue-twister but it is in reference to the Live Ships Map on MarineTraffic.com based on Automatic Identification System (AIS) transponder data and iAIS.

AIS Graphic

You can find out a lot of information about ships underway. There is no data off the coast of East Africa, let alone Somalia, unfortunately. So here’s the Bay Area, as an example instead:

Clicking on one of the ships brings up its dox.

A micro view shows proximity of the boats and, in this example, you can even watch the pilot boat come out to greet the Filipino “Sun Right” ship and take over navigation for the Bay.

A macro view tells a very different story. If you pull out far enough on the maps you get green boxes with numbers indicating the number of data feeds. California ports show hundreds at the most. Green boxes around the Asian ports show numbers in the thousands.

Take a look at Shanghai. Pink squares represent navigation aids. Green is cargo, red is a tanker, grey are “unspecified”.

You also can create watch lists or “fleets,” search for specific vessels and ports, display their tracks, show predicted courses, and add GRIB (wind) data. Even small vessels should be able to easily incorporate this data into warning, distress and chart systems, marking a huge difference in situational awareness especially in low/no visibility conditions.

I am curious about the ability to build fleets or watch lists based on manifests such as port of call or country…imagine building a map with the tracks and the predicted courses for all the fuel tankers from or headed to a country.

I also wonder about correlating the movement of tankers to the rise and fall of fuel prices. It is said that diesel prices in the Bay Area rise when tankers arrive from Latin America and fill up. Not all the data is clean, however. I ran through the Shanghai ships reporting themselves as passenger vessels and found at least one that was actually a oil/chemical tanker.

Chrome tarnished by Glazunov again

Sergey Glazunov, a university student in Russia, has been recognised by Google many times with special awards as well as cash for reporting Chrome security bugs since they started their program in January 2010. Here are his awards from just one version changelog last year:

  • [$1337] [65764] High Bad pointer handling in node iteration. Credit to Sergey Glazunov.
  • [$1000] [66560] High Stale pointer with CSS + canvas. Credit to Sergey Glazunov.
  • [$1000] [68178] High Bad cast in anchor handling. Credit to Sergey Glazunov.
  • [$1000] [68181] High Bad cast in video handling. Credit to Sergey Glazunov.
  • [$3133.7] [68666] Critical Stale pointer in speech handling. Credit to Sergey Glazunov.

ChromeIncluding the above list he has earned $3,133.7 (eleet) twice, $2,500 three times, $2,337 three times, $2,000 twice, $1,337 (leet) five times, $1,000 thirty-six times, and $500 once ($67,963.4 total).

He was the first to win the “3133.7” level award. Now he has won the $60,000 purse for finding a full exploit during the Google Pwnium competition.

Economists might be tempted to ponder whether a researcher would keep a higher exploit until offered the higher purse, or whether a higher purse gives incentive to find a higher exploit. $60,000 for one exploit instead of $67,963.4 for fifty-two exploits is a study in incentives but also brings up the cost of handling/defending one flaw versus fifty-two.

Updated March 9 to add: Another young man has proven a full exploit in Chrome. This other person said the bug was easy to find, but unlike Glasunov he found it hard to get the attention of Google. Wired just ran a story that references the point I make above about Glazunov’s experience.

[Glasunov] is one of Google’s most prolific bug finders and earned around $70,000 for previous bugs he’s found under the company’s year-round bug bounty program. As such, he’s very familiar with the Chrome code base.

I saw no reference anywhere to the totals won by Glazunov before I wrote this post. I would have waited if I had known Wired would add it up and run it in a story, instead of spending time compiling the data myself.

More important to this story, however, is a comparison of the researchers. Wired doesn’t do much analysis on their motives. Wired seems to also hint that Pinkie Pie is a relative newcomer compared to Glazunov but I think that’s a mistake. The big difference I see is that Glazunov uses his real name, as a student, and regularly submits his bugs while the other wants to remain anonymous and has asked for a job from Google but otherwise has been reluctant to submit his research for public verification.

The tall teen, who asked to be identified only by his handle “Pinkie Pie” because his employer did not authorize his activity, spent just a week and a half to find the vulnerabilities and craft the exploit, achieving stability only in the last hours of the contest.

[…]

Pinkie Pie, wearing shorts, a t-shirt and glasses, said he’d never submitted a vulnerability report to Google before, but he had sent his resume to the company last year seeking a job. He wrote in his cover note that he could crack Chrome on OSX, but he never got a reply.

Claiming in a cover letter that you can crack Chrome on OSX but that you haven’t submitted them yet for verification is a passive method at best. The hiring department probably gets a lot of letters with unsubstantiated claims so it’s understandable that they waited for more proof instead of jumping on it. However, I also see why Pinkie Pie might have choosen to make a claim instead of proof when applying for the job. Submitting an exploit for a $60,000 purse is an opportunity to win simply upon verification, whereas submitting for a job is a far riskier option that can lead to rejection and far less money, even after verification.