IBM Learns to Speak Chinese

New speech-recognition software could blow open the Chinese market with sweet talk.

IBM stands to gain a strong footing in China's lucrative software market with new speech-recognition software that transcribes spoken Mandarin with 95 percent accuracy without repeated phrases. The software, developed by IBM's Beijing Research Lab, is part of major incentive by Big Blue to penetrate the Chinese market starting with the basics: getting computers to listen to their Chinese users.

The PC market in China has been largely curtailed by typographical obstacles. Users are hamstrung by a foreign, American keyboard and an awkward design that forces them to make multiple strokes for every ideogram in the language. The English keyboard "is 'Greek to them,'" says Kathleen Keck, a representative at the US Information Technology Office which promotes software and telecommunications investment in China. "Every time they finally type the word they want," says Keck, "they then get two options and they have to choose between them."

The VoiceType program overcomes two challenges unique to the Chinese language: tone and pitch changes. "We had to think about how to represent the acoustic space in such a way that tonal qualities - what Chinese is built on - are properly mapped into the characters," says David Nahamoo, senior manager in the Human Language Technology Department at IBM's research lab. "And in Chinese, when pitch changes, so does meaning."

The software breaks speech down into three-word sections called "trigrams" that speed up transcription by predicting the third word in the sequence based on the first two. "If you don't have any idea of the language, every time you want to recognize the next word, chances are it'll be 1 out 30,000," says Nahamoo, "but if you use a trigram predictor, it goes down to 150 to 200. So you take big step."

IBM's software allows users to skip the keyboard altogether. With a 30,000-word database, users talk into a microphone and the computer transcribes the speech immediately. Unlike competing voice recognition systems like Motorola's Clamor project, VoiceType doesn't require any repetition.

Much of the rising computer investment stems indirectly from government policies against large, expensive families says Keck. "We're seeing a lot more penetration into Chinese homes because of the one-child policy, which means families will spend a lot of money - like US$1,200 - to put their child in a better position in the future." But for any one to succeed in China, says Keck, "they've got to go local, and that means a keyboard that people can use."

From the Wired News New York bureau at FEED magazine.