Feb 1, 2017 10:00 AM

MIT Made a Wearable That Knows How a Conversation's Going

Researchers from MIT CSAIL are using artificial intelligence to translate how people feel when they talk.

No matter how debonair you are at your best, conversation can be awkward for anyone. That’s especially true for those who struggle to pick up on social cues. To help navigate those rocky exchanges, MIT CSAIL researchers have created a wearable system that can tell whether the person you’re talking to is happy or sad. It’s a start.

The device takes an existing research-grade wearable---Samsung's Simband smartwatch, which can measure movement, heart rate, blood pressure, blood flow, and skin temperature---and pairs it with audio capture that can pick up signals like tone, pitch, energy, and word choice, and provide a transcript of the text. By weighing all of the incoming signals, algorithms can classify each five-second installment of conversation as either “positive” or “negative.”

“You have a GPS in your pocket, it’s very complicated technology,” says study co-author Tuka Alhanai. “But we don’t have a GPS for social interactions.”

Signals and Noise

The team started with over 500 signals that could potentially tip off how a conversation was going---ranging from movement, to speech patterns, to individual word choice---and let on-board artificial intelligence sort out which were most important, rather than letting preconceptions dictate.

“With very little structure the algorithm was able to arrive at what we intuitively thought,” says study co-author Mohammad Ghassemi. Long pauses are more likely to come during sad stories, for instance, as well as fidgeting. Subjects told happy tales with more varied speech patterns. Hearing someone isn’t as effective as being able to see them as well.

It may sound obvious to some, but these are also the types of social cues that can be difficult for those with anxiety, or for those on the autism spectrum. Having a device that can take the temperature of a room could go a long way to mitigate those issues.

That’s also what makes the CSAIL study’s form factor so critical. There are other systems designed to translate tone into intent, but Ghassemi notes they’re often useful only in tightly controlled lab settings, or they require specialized equipment to work. The tricked out Simband looks like any other smartwatch. It’s just a little smarter, in this very specific way.

“If you’re aware of technology being there, it alters the interaction,” says Ghassemi. “If you want to capture natural interactions between people, and really quantify what a natural interaction looks like, the technological component should be as unobtrusive as possible.”

Safety First

Potential issues come to mind, of course, when thinking of a device that tells you how a conversation is going, the first of which is: What if it’s wrong?

And it is! The Simband can determine overall tone with 83 percent accuracy, which leaves plenty of room for misinterpretation, as does the fact that its descriptive buckets are so vast.

“If an interaction was good and you classify it as bad, that’s way worse, potentially, than classifying a bad conversation as good,” says Ghassemi. It’s sometimes better not to know than to make the wrong assumptions. But he and Alhanai have accounted for that as well; in their system, users can set their tolerance for mistakes in either direction. So if, for instance, you never want to misclassify a good conversation as a bad one, it won’t hazard a potentially misleading guess.

There’s still plenty of room for improvement here. Eventually, the researchers hope to adapt the system to more common smartwatches, like Apple’s. They foresee additional features, like making the device buzz once if an exchange helps interaction go better, and twice things are getting awkward, to keep its wearer pointed toward a conversational North Star. They also intend to add more granularity to to the system; why stop at happy or sad when you can dive into boredom, tension, excitement, and all the other textures of human expression?

That may still be some time away. But Ghassemi and Alhanai are hopeful that will all come together, in part because their AI system came so far on such a relatively small data set. The more conversations it can log, the more adept it will be at parsing them, and the more navigable the often murky waters of repartee will be.

Comments