LONG BEACH, Calif. – What does it take to get a one-year-old child from the infant utterances of "gaga" to the articulate pronunciation of "water?"
In the case of Deb Roy's infant son, it took three caregivers and carefully modulated coaxing over about seven months.
We know this because Roy recorded the entire process on nearly a dozen cameras and microphones embedded in rooms throughout his house during the first three years of his son's life. He presented some of his findings at the Technology Entertainment and Design conference (TED) last Wednesday, including a charming audio clip he played that tracked his son's blossoming journey from "gaga" to "water."
[Click for audio of Roy's son learning "water."]
"He sure nailed it, didn’t he!" said Roy at the end of the clip, as the audience laughed.
Roy, a cognitive scientist and director of MIT Media Lab's Cognitive Machines Group, wanted to understand how children assimilate and learn language in order to build robots that can learn as children do. So in 2005, before his son was born, he and his wife wired their home with 11 cameras and 14 microphones to capture every word the infant and his caregivers spoke and record the environment and events around which these utterances occurred. Part of the goal was to determine how much influence place and context have on language acquisition.
With a number of privacy protections in place – including an "oops" button in each room that allowed family members to turn off cameras and mics during personal moments -- they recorded an average of 10 hours a day, amassing 90,000 hours of video, or 200 terabytes of total data. The so-called Speechome project is the largest ever study of child language development in a natural or clinical environment, or as Roy calls it "the largest home video collection ever made.”
Since they stopped recording in 2008, Roy and his MIT team have transcribed more than 7 million words and created computer models to track the movements of his son and caregivers throughout the house over time and match them to language. The data is still being processed, but Roy provided a look at one surprise his team has discovered so far.
By collecting each instance in which his son heard a word and noting the context, they mapped all 530 words the boy learned by his second birthday. In doing so they uncovered a surprising pattern in which caregivers would suddenly slip into simple language, then slowly move back into more complex sentence structures.
This wasn't unexpected on its face. Roy and his team had logically hypothesized that if caregivers were attuned to a child’s language skills, they would begin communicating with the child in simple language that grew more complicated as the child showed signs of comprehension.
"But when we plotted it, we didn't see it," Roy told Wired.com in an interview. "There was no correlation."
Instead, the caregivers actually used simpler language the closer the boy got to grasping a word. At the point they sensed he was on the cusp of getting it, all three primary caregivers – Roy, his wife and their nanny – simplified their language to guide him to the word, then gently brought him into more complex language once he passed the hump.
"For each of the primary caregivers we found the same trend," Roy said. "We're getting longer sentences when he doesn't know the word, and then they start getting shorter, and they're pretty much at their shortest as he starts to get the word.... Was I consciously doing that? I can't imagine anyone consciously doing that."
Roy says it's evidence of a "continuous feedback loop" that shows caregivers modifying language at a level never reported or suspected before. It's not just that his son was learning from his linguistic environment, the environment was learning from him, he told the TED audience.
The finding has changed his thinking about causality.
"I now think looking for linear cause effects —where the environment causes certain effects in my child — is a bad formulation," he says. "Because … as soon as you have feedback loops, it’s a chicken and egg kind of problem to say what was the original cause of something. What you're actually doing is studying a dynamical system."
Roy says he hasn't figured out how to apply the work with his son to his robots yet, but at least two developments have come from the study so far. His team is currently designing a system to monitor autistic children in a similar manner to see if they learn differently or need different kinds of clues from their environment to grasp language. The project is being funded by the National Institutes of Health and is currently looking for families to participate.
The other development to come from the study is a startup company called Bluefin Labs that Roy co-founded based on tools that he and his team developed to analyze the vast amounts of video and audio they collected from his home. They're using the tools to gather media content and public discussions about the content to uncover patterns.
Roy pulled up a series of graphs and animations to illustrate the tools, which, of course, quickly caught the attention of the media and ad executives in the audience.
About six months ago Bluefin began collecting real-time TV content -- programs and advertising from about 30 channels – as well as comments from publicly available social media feeds. In the case of the latter, they process about three billion comments a month from Twitter, public Facebook updates and blog scrapings, to find links between what's on TV and what people are saying about it.
They can examine how people respond to the same ad in different viewing contexts to understand the effect context has on how an ad is perceived. They can also focus on one person in social media to see how an influential individual drives a conversation.
The data, of course, can apply equally well to selling soap as it can to selling a president. They looked at President Obama’s State of the Union address early this year and mapped the online conversations around it to track the peaks and valleys sparked by specific points Obama made.
"You have this instantaneous social echo that you can quantify and understand how different sub networks different groups are resonating with different parts of his address and tie it to different networks and see what the reactions are and measurably compare them," Roy said. "You can literally see a nation’s reactions, conversations and dialogue that’s being spurred by this important piece of mass media."
Photo: Deb Roy speaking at TED2011, in Long Beach, California. Credit: James Duncan Davidson/TED