Let’s say you’re talking to your brand-new Apple TV. You click the remote and rainbow-colored sine waves on your flatscreen indicate Siri is listening. It’s slick: You ask for a channel or to replay a scene and it responds smoothly. But then you say, “Show me some new comedies,” and maybe it suggests ... Pixels. Of course, Pixels is a wretched movie, so you look at the next suggestion, and oh, man: Hot Tub Time Machine 2. Now you’re cursing into the remote, just to see what happens.
The thing is, Apple TV doesn’t keep your foul language to itself. To understand your speech, it sends the audio to Siri’s cloud servers, where it’s processed—and archived, for up to two years.
Welcome to the latest, weirdest phase of our relationship with technology: machines that eavesdrop on us. It’s a side effect of the Internet of Things. As processors shrink, inventors have been stuffing digital smarts into everyday appliances. But since you can’t easily put a keyboard on a coffeepot, the easiest input method is voice. This has sparked an explosion of tools with ears: There’s the Nest webcam (which perks up when it detects activity in your house), Amazon’s cylinder-shaped Echo personal assistant, and the Hello Barbie doll—which, when your child pushes a button and talks to it, sends whatever they say into the ether.
It’s a now-classic example of a privacy trade-off. Ergonomically, voice control rocks. I use it all day long on my phone and would happily use it to control my house, car, and radio. But voice has unusual emotional freight. You wouldn’t like racy texts to your partner or angry emails to a business colleague leaked in a hack, would you? Now imagine the same stuff leaked, except it’s the audio of your voice speaking them aloud.
Marc Rotenberg, head of the Electronic Privacy Information Center, suspects most people don’t realize their audio goes online for processing. They think the device just “understands” their voice on its own. “I see a serious disconnect between how most consumers believe these devices operate and how they operate in fact,” he says. He’s likely right. Who expects a toaster to be tattling to Google?
To be fair, most of these devices record your voice only after you invoke their genielike wake-up phrase, as when you shout “Alexa” at Amazon’s Echo—a technique called “phrase spotting.” These gadgets are not, the companies say, streaming your voice all day long. But it’d be good for the devices to provide very strong cues about when they are and aren’t recording. The voice-controlled robot Jibo, for example, has been designed to signal when it is actively listening: It slumps over when dormant and swivels its head when it’s alert. “We take these issues seriously,” Jibo CEO Steve Chambers says.
Will we just acclimatize to being overheard all day long? Chambers thinks so. The next generation has grown up trading information for convenience, he says. “I’m not sure younger people will have quite the issue I might.” He’s right, though as with our other shoulder-shrugging privacy accommodations—persistent GPS tracking, for example—I’m not sure that’s a good thing.
I’m hoping for a technological solution. Audio processing is getting cheaper and faster all the time. Our listening tools could soon be able to process speech right in our houses and pockets, without needing to squirt it across the country to a server. Having machines that hear what we say will be enormously useful. But it’d be nice if they kept our secrets too.
Email clive@clivethompson.net.