Voice Recognition is a 'Dragon'

The days of simply telling your computer what to do are approaching, but the world of Judy Jetson is still a daydream. The latest edition of Dragon's voice recognition software comes the closest yet. A Wired News product review by Jennifer Sullivan.

When can you expect your PC to obey when you speak to it casually -- say, while you're relaxing on the sofa and sipping a martini?

Soon, if Dragon Systems' NaturallySpeaking Preferred Edition software can improve on V4.0 -- a little here and drastically there.

It is the program that the world has been waiting for since Hanna-Barbara conceived The Jetsons. Too bad the fantasy remains a fantasy.

"It's the Star Trek phenomenon," said Jeffrey Tarter, publisher of the SoftLetter newsletter. "We've all grown up watching sci-fi voice recognition, where accents and the environment don't matter. It's like a generation of adolescent boys reading Playboy -- [it's not like] the real thing."

That is not to say Dragon's voice-recognition suite can't help people navigate their way around computers and the Internet. Other software makers, such as IBM's ViaVoice and Lernout and Hauspie's Voice Xpress, make similar products.

The current incarnation of voice recognition software is great for narrowly defined tasks, such as medical transcription for doctors. But there's a long way to go before users can expect their computers to respond to the sound of their first command.

Or second or third command, for that matter.

Simply put, users need time to master software such as Dragon's NaturallySpeaking Preferred Edition Version 4 (US$169). And the software needs time to master its user. The software has to learn voice patterns explicitly in order to perform satisfactorily.

As Tarter said, "This application is pushing the envelope on [existing] technology."

With that in mind -- and because I suffer from repetitive strain injury -- I tested Version 4 in an effort to use their scientists from typing and surfing the net all day.

[What, you didn't understand the last part of that sentence? The fact is, I used the Dragon software to write this article. I dictated "...in an effort to spare my wrists from typing," and it came out "...in an effort to use their scientists from typing."]

The installation and setup went smoothly, although I was embarrassed that I couldn't immediately determine what kind of sound card I use. I also couldn't immediately figure out where the second microphone plug goes (in the headphone jack).

And, ironically, the setup requires some typing.
To train the software, I read aloud a 30-minute selection from Charlie and the Chocolate Factory, although I doubted words like "scrumdillyumptious" would help me write tech stories for Wired News.

Then I fed in 20 of my old Wired News stories, so it would learn the kinds of words I'm likely to use. A quick tour of the software demonstrated the tone and speed in which I should dictate, which scroll down was surprisingly pretty natural-sounding.

[Yep, "scroll down" was dictated into the story. That should have been " ... which was surprisingly pretty natural-sounding"].

OK. Now, scroll down.

I was ready to go. I said, "Start Microsoft Word." The program booted up. Just as quickly, my computer -- an IBM ThinkPad 600 with well more than the minimum 32MB of RAM, Pentium II processor required -- crashed.

I didn't have enough available disk space. Dragon recommends at least 95MB.

I rebooted and opened Microsoft Word again. I read two long, complicated sentences aloud and Dragon got every single word right. "Holy shit, Marilynn, this rules!," I yelled to one of my editors. "Wholly shit Maryland casseroles," my Dragon dutifully transcribed.

Aside from the geographic food error, it also missed the comma and the exclamation point. Punctuation marks need to be explicitly dictated.

Then I read aloud the most boring tech article that I could find, and it got all but two complex sentences correct.

Dictating stories in Microsoft Word is what Dragon did best for me, especially when it was my only open application. Once Dragon is trained, it's impressively accurate and fast. And if you take the time to correct its mistakes by spelling out or selecting the words you meant to say, the software learns each time.

Going back and fixing mistakes can be pretty tedious, and correcting transcription errors and homonyms, and inserting words here and there is much harder since Dragon is better at recognizing words in context.

Thus, in my deadline-driven newsroom, I still end up with enough mistakes to keep me from writing even half as fast as I used to. But, according to Dragon, the more you train, the better the recognition, and the faster you get.
I tried using Dragon for email using Microsoft Outlook 98, and for Net surfing with Microsoft Internet Explorer 5. These are the optimal programs to use, according to Dragon.

Surfing the Web is slow, but promising. Users can use commands like "go to address" followed by "www-dot-wired-dot-com" to visit different sites.

Text links are easy to jump to -- the user just says "click" and then the name of the link. But it's tougher to click on search buttons or check the boxes in user surveys. Users can say something like, "click image," to select the first image on the page, and then "next," to go to the next one.

That's a lot of "nexts" if you're shooting for the bottom of the page.

Not all Web pages are speech-enabled -- which would allow them to conform to certain guidelines that make them work best with voice recognition software.

Dragon Systems' manager of technical support Kevin Gervais said it's difficult for the software to recognize things like a GIF posing as a search button.

The way to get around search buttons that won't click is Dragon's "MouseGrid" feature, which draws a grid with nine squares on the computer screen. Surfers say the number of the square where the button is that they'd like to click. The grid keeps getting smaller and smaller until it's right over the button. Then you say "mouse click" again to nail the sucker. It's accurate, if slow.

That same feature is excruciatingly cumbersome to use in email -- the application where Dragon gave me the most trouble.

It was difficult clicking back and forth between the frame that displays the contents of my inbox and the frame that displays the email message. The best way to dictate email addresses is to devise shortcuts using Dragon's vocabulary builder feature.

Dragon users have created all kinds of Web sites, chat rooms, and resources as training aids. But perhaps the best resource is the program's "online help notes" feature. The software will even recognize a tired, frazzled-sounding, "Give me help."