May 1, 2000 12:00 PM

Say Anything

Your guide to speak performance. Machine translation gets a lot of lip service, but building machines that communicate using human languages has proven tricky. Now decades of R&D are finally paying off. Today's Babel-sized stack of products can understand our words and help us understand one another. These off-the-shelf tools fall into four categories, mirroring […]

All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.

Your guide to speak performance.

Machine translation gets a lot of lip service, but building machines that communicate using human languages has proven tricky. Now decades of R&D are finally paying off. Today's Babel-sized stack of products can understand our words and help us understand one another.

These off-the-shelf tools fall into four categories, mirroring the research fields from which they've sprung: speech processing, speech synthesis, machine translation (MT), and natural-language processing (NLP). Speech processing converts speech into text. Speech synthesis converts text into speech. MT translates text from one language into another. NLP understands grammar: how words connect and how their definitions relate to one another. This last field stands on its own, but also contributes to the other three, because computers listen, speak, and interpret more accurately when they have guidelines to what words can mean.

It might seem that the right combination of these products would yield a real-time universal interpreter: something that converts speech into text, translates the text, and then recites it intelligibly to a listener who doesn't understand the original language. In fact, researchers say that's right around the corner - for conversations limited to preordained subject matter. Translating free-ranging discourse, however, presents problems that have bedeviled AI researchers for years, so nobody's making promises. Meanwhile, we can expect a variety of devices and services - some online, some portable, some designed for corporate networks - that translate basic, circumscribed interactions, like booking a room or telling a cabbie to take you to the nearest bar.

Speech Processing

__ Speech processing empowers computers to recognize - and, to some extent, understand - spoken language. This technology has engendered two types of software products: continuous-speech recognition and command and control.

Continuous-Speech Recognition The aim of continuous-speech recognition is automatic dictation: You talk to the system about anything; immediately, your words appear as text onscreen. Such products can be remarkably accurate, but only when conditions are just right: The microphone delivers a high-quality signal, the surrounding environment is relatively noise-free, and the person speaking has been "enrolled" (that is, the system has been trained to recognize his or her voice). The system uses enrollment, along with a pronunciation dictionary, syntactical rules, semantic relationships, and other programming, to recognize phonemes and then guess at words and punctuation.

NAME BRANDS Dragon Systems NaturallySpeaking, IBM ViaVoice, L&H Voice Xpress, Philips FreeSpeech.

NUTS & BOLTS L&H claims its Voice Xpress Professional Version 5 ($149,www.lhs.com) achieves up to 96 percent accuracy via techniques like learning from corrections and filtering out "um's" and pauses.

Command and Control Computers understand speech most easily when vocabulary and context are tightly constrained. In a command-and-control system, the number of viable commands and responses is confined to a specific set of activities, and even unenrolled users can give the system orders. If the system understands, it can take appropriate action; if it doesn't, it can ask for clarification. The latest desktop software products address both productivity ("Draw a three-column table") and pleasure ("Lock missiles!"). Industrial-strength telephony servers recognize commands to do things like read customers their bank account balances. The same technology lets voice-portal subscription services retrieve email, business listings, and news headlines over the phone. (See "Capturing Eardrums," page 246.) An emerging generation of chips promises to engender a range of portable voice-activated special-purpose devices.

NAME BRANDS Desktop software: Conversa TalkRadio and Conversa Web, Mindmaker Game Commander, Nuance Voyager, One Voice Technologies IVAN. Telephony servers: Nuance BetterBanking, Nuance BrokerageSuite, One Voice Technologies VoiceSite, Philips SpeechWave, Speech Machines DictationNet, Vocalis SpeechWare. Voice portals: BeVocal Inc. BeVocal, General Magic Portico, Tellme Networks Tellme, Webley Systems Webley Assistant, Wildfire Communications Wildfire. Chips: ISD ISD-SR3000, Oki Semiconductor MSM6679A VRP, Sensory Voice Direct 364 ASSP.

NUTS & BOLTS The BeVocal (free,www.bevocal.com) speech portal - launch date TBA - will use voice input to retrieve traffic reports, hotel listings, et cetera. It will also transmit info to your screen phone via short message service (SMS). A dedicated chip, the ISD-SR3000 ($5 in OEM quantities,www.isd.com) can be programmed to recognize a custom vocabulary regardless of speaker gender or accent, allowing small devices to respond to voice commands.

Speech Synthesis

__ The ability to synthesize the sound of speech is useful for applications that require spontaneous interaction, or in situations where reading isn't practical (giving instructions to a driver, for example). In products aimed at the general public, it's critical that the output sound pleasant and human enough to encourage regular use.

Text to Speech Text-to-speech (TTS) capability renders written language in spoken form. The best systems take sentences - not just individual words - into account when determining rhythms and inflections, making the phrases sound less mechanical.

NAME BRANDS Elan TTS speech engine, L&H RealSpeak, Lucent Text-to-Speech Engine, SoftVoice TTS, Willow Pond WillowTALK.

NUTS & BOLTS Bell Labs, now the R&D arm of Lucent, spent decades developing speech synthesis technology for phone systems. Lucent's multilanguage TTS (Lucent TTS Engine Software Development Kit: $595, www.lucent.com/speech) is one of the most advanced products of its kind.

Machine Translation

__ Like speech processing, automated translation of text in one language to text in another works best when the subject matter is limited and the system is preconfigured. Some MT products generate output for immediate use (unassisted MT), while others perform initial translations intended as the basis for further work (assisted MT).

Unassisted MT Unassisted MT works well with text feeds (brief chunks of text), especially if they're only for informal, short-term use, like multilingual chat rooms or search queries. L&H is working on a German-English mail gateway for DaimlerChrysler: When employees send email, recipients will receive a translation along with the original text. Even when translating longer documents, quick and dirty often will do - just enough to convey the general idea and determine whether a better, human-assisted translation is needed. This process is called gisting; the results aren't pretty, but fluent users get the gist.

NAME BRANDS Text-feed MT: IBM alphaWorks Native Search, L&H iTranslator, MultiLingual Media GlobalTV. Gisting: Alis Technologies Gist-in-time, Systran (at AltaVista's Babel Fish and other Web sites), T-Mail.com, Transparent Language FreeTranslation.com.

NUTS & BOLTS MultiLingual Media's text-feed system for broadcasters, GlobalTV (30 to 90 cents per subscriber per year, www.multilingualmedia.com ), translates closed-caption information for foreign-language TV subtitling. FreeTranslation.com renders the gist of English text in French, German, Italian, Norwegian, Spanish, and Portuguese. It can also gist French, German, and Spanish into English.

Assisted MT When translation fidelity is critical, MT requires a human translator to clean up after it. Automated performance can be improved up front by constraining vocabulary and grammar in the original text. The best systems learn from corrections, improving with use; in a networked installation, each addition to the domain dictionary helps everyone on the net. Enterprise-level systems are used to translate instruction manuals, email, and the like. Personal translation systems perform similar functions on the desktop.

NAME BRANDS Enterprise MT: IBM TranslationManager 2, L&H Power Translator Pro, Logos TranslationControlCenter, Systran PROfessional and Enterprise, Trados Translation Solution. Personal MT: L&H Simply Translating, Quickwiz Easy Lingo, Systran Personal, Transparent Language EasyTranslator.

NUTS & BOLTS Systran Enterprise ($9,500 for 20 users and one language pair,www.systransoft.com), which runs on Windows NT Server, translates English to and from French, German, Italian, Japanese, Portuguese, and Spanish. It converts Chinese and Russian into English, but can't go the other way.

Natural Language Processing

__ NLP systems interpret written rather than spoken language. In fact, NLP modules can be found in speech-processing systems that start by converting spoken input into text. Using lexicons and grammar rules, NLP parses sentences, determines underlying meanings, and retrieves or constructs responses. This technology's main use is to enable databases to answer queries entered in the form of a question. A newer application is handling high-volume email. NLP performance can be improved by incorporating a commonsense knowledge base - that is, an encyclopedia of real-world rules.

Querying Traditionally, NLP has been used to translate written questions into a database-query language like SQL. Connected to a structured database, this kind of front end can answer straightforward questions ("Who sold the most widgets in New York last year?"). The same concept has given rise to broader question-answering systems. In this case, a collection of documents is analyzed to construct a database of key words and concepts. Natural language questions can be compared to the contents of this database to retrieve natural language answers.

NAME BRANDS NLP front ends: Inference k-Commerce Web, Microsoft English Query. Question-answering systems: AnswerLogic Inc. AnswerLogic, EasyAsk Inc. EasyAsk, InQuizit Technologies InQuizit, LexiQuest LexiGuide, MIT AI Laboratory Start.

NUTS & BOLTS Microsoft English Query for SQL Server 7.0 (available to Microsoft Developer Network universal subscribers, msdn.microsoft.com) lets SQL developers add natural-language front ends in English, French, German, and Japanese. AnswerLogic's AnswerLogic System ($250,000 per year for a typical installation, www.answerlogic.com) runs on a Web server and answers questions based on site content and documentation.

Email Response NLP comes in handy when dealing with floods of incoming email. An email-response system sifts through incoming messages and answers the ones that ask easy questions like "Where are your corporate offices?"

NAME BRANDS Brightware Automated Answer, eGain Mail, Inference k-Commerce E-Mail, Kana Response.

NUTS & BOLTS According to Brightware, Automated Answer ($175,000 for 10 seats,www.brightware.com) automatically classifies and responds to as much as 80 percent of incoming email. For the rest, it sends preformatted responses and alternatives to customer service agents so they can reply appropriately.

Commonsense Knowledge Base How can an NLP application know that the sentence "John walked down the street and turned into a store" doesn't mean that John experienced a personal transformation? It can't - unless it has a set of rules about how the world works. A commonsense knowledge base provides guidelines that help the system avoid silly mistakes due to ambiguities in grammar and vocabulary.

NAME BRAND Cycorp Cyc Knowledge Server.

NUTS & BOLTS Cyc Knowledge Server ($200,000 plus $40,000 annually, www.cyc.com) is the result of what may be the largest AI project ever: a 15-year effort to catalog everyday truths that people know but computers don't. According to Cycorp, Cyc can augment any system that communicates with people about real-world situations.