Oct 29, 2010 7:12 AM

Kinect for Xbox 360: The inside story of Microsoft's secret 'Project Natal'

This article was taken from the November issue of Wired magazine. Be the first to read Wired's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online

**With its Kinect entertainment platform, Microsoft plans to reinvent how we interact with computers. But first it had to solve a few impossible problems. Wired gains exclusive access to the secret "

Project Natal".**

A jungle drumbeat stirs the air at Jefferson and Figueroa on a balmy June Sunday evening in downtown Los Angeles. The vast Galen Centre, normally a University of Southern California sports arena, has tonight been transformed into a tropical forest, where 76 Cirque du Soleil acrobats, dancers and actors cavort as clowns, cavemen and a lovesick gorilla tying flowers into the hair of unsuspecting guests.

Microsoft's Xbox team has packed the hall tonight with VIPs such as Billy Crystal, Michael Cera, Rosario Dawson and Christina Hendricks, but all have been told to wear a white polyester poncho and to enter the auditorium through what appears to be a giant TV into a living-room where actors play family members. As the hour-long show begins, these families are winched up on their sofas to remain high above the audience, cheering as eight-metre projection screens, in total longer than an American football field, flash into life. "Since the dawn of time, humanity's long journey has led us to countless discoveries," a solemn voice intones on the loudspeakers. "Yet with each leap forward for civilisation, more people have been left behind. But our quest has taken us to a completely new horizon." There is a pause. "History is about to be rewritten. This time human beings will be at the centre -- and the machines will be the ones that adapt. After five million years of evolution, is it possible that the future of humanity is humanity itself?"

A four-metre puppet elephant slowly carries a boy towards a hill of rocks, from which emerges a giant 3D black-and-green Xbox logo that raises him higher. Green lights suddenly shine out from the poncho shoulder pads. "Hi, Alex," the voice booms. "Welcome home."

The boy is lifted to a 12-metre-diameter rotating steel chamber bearing a screen on which he interacts with an avatar that copies his every movement. He then climbs inside into a rotating living-room where a family is enjoying a car-racing game they continue to play even when upside down.

As the screen reveals to Alex, this is "Kinect" -- Microsoft's new motion-sensing system for the Xbox 360 that does away with the game controller in favour of the player's own body. The games demonstrated tonight -- from bowling to yoga, river-rafting to road racing -- suggest only the beginnings of why Microsoft sees Kinect as its next multi-billion- dollar bet. It's not simply that it can track your body in real time, or recognise who in the room is playing, or respond to voice commands. It's that it can do all of these things at the same time -- heralding what its creators call the era of NUI: natural user interface.

At the global media briefing next morning, Don Mattrick, senior VP for interactive entertainment, explains why this is about much more than gaming."Kinect brings your living-room alive in a social and accessible way," he tells the Wiltern Theatre. "Interactive entertainment is the greatest form of all entertainment, and it should be open and approachable to as many people as possible. It is this belief that led us to remove the last barrier between you and the entertainment you love. By making you the controller, we will transform how you and your friends experience games and entertainment."

So as well as demonstrations of Kinect Sports (from soccer to bowling), Dance Central (screen-based dance lessons) and Kinectimals (interaction with virtual pets), journalists are shown Laura, one of the Kinect engineers, videoconferencing headset-free with her twin sister Kristen in Texas -- and she merely says, "Xbox end chat," when it's time to disconnect. Ron, another engineer, signs in to Kinect simply by waving, and then similarly waves the air to access Netflix, Facebook and Last.fm."Xbox, play music," he says; then, "Xbox, pause." No remote control.

An hour later, Matt Barlow, marketing boss for Xbox 360 and Xbox Live, explains to Wired the vast business opportunity ahead. "Gaming is a $48 billion business," he says. "The challenge is how you continue to grow. We have the capability to turn your living room into a petting zoo, into a sports stadium -- so our customer base is all those who have rejected gaming either because of the content or the controller. If you could say doubling, tripling, quadrupling the audience -- I believe that. We're also talking about a hypersocial audience, predominantly female, who want to interact while playing."

As other executives point out, this is about far more than gaming to Microsoft. "We believe the platform will move beyond the console," says Ben Kilgore, general manager of the Xbox platform. "Every kind of major Microsoft group in the company is evaluating Kinect. They're trying to understand what it means for those experiences."

By mid-2007, Don Mattrick, who runs Microsoft's interactive-entertainment business division in Redmond, Washington state, was demanding a new direction for the Xbox 360. "There has to be a fundamental reimagining of the way we interact," explained Marc Whitten, Xbox Live VP, in a strategy meeting. To be fair, Whitten -- along with the rest of the senior executive team

-- wasn't certain what this meant in terms of hardware.

Still, they compiled a cursory list of desired features: motion-tracking controls, facial recognition, speech recognition and backwards compatibility.

The problem wasn't vision. It was the task's sheer impossibility. Finding cameras that could map a living-room in 3D was easy. Getting one reliably to decode the flailing limbs and shouts of 40 million Xbox users was a whole other dream. To pull this off, the hardware would require software "brain" capable of interpreting what the team calculated was a crushing 1023 spatial and aural variables at any given moment. And it would have to do this on the fly, with no perceptible on-screen lag.

Still, fragments of the solution did already seem in place.

Microsoft Research's Beijing bureau had collected tomes on the successes (and failings) of facial-recognition technology.

Redmond's speech-recognition software -- which now ships in Windows 7 and the Ford Focus -- had already been in development for decades. And Redmond's hardware engineers were busy creating exotic gyroscopic and accelerometer-based controller prototypes in anticipation of the coming shift. But it would take exhaustive research and testing before anyone could guess whether such a shift would be workable -- or profitable.

As 2008 began, Mattrick made it known that he wanted to transform and expand Xbox 360 and allow users to experience gaming and entertainment in a different, more social way. Could his teams take depth sensors, multi-array microphones and RGB cameras and turn them into a consumer experience? The magic box would need to track people at 30 frames per second, recognise them, understand how they move, and incorporate voice recognition -- and all in ways that could enhance the game experience.

There was just one problem: this hadn't yet been done anywhere in the world. "The challenge was simple: we needed to get rid of the barrier.

We needed to actually take a look at all the different ways we could get rid of the controller."

Alex Kipman, 31, the man tasked with implementing Mattrick's vision, speaks at a machine-gun's pace, and uses broad, sweeping hand gestures to illustrate his points. Kipman, from Natal in Brazil, has registered 16 patents since joining Microsoft in 2001.

Now, as incubation director for Xbox 360, he got to choose a code name for his new task: Project Natal.

In jeans and brown T-shirt, he explains the incubation journey.

The first task, he says in a Redmond conference room, was to research probabilistic machine learning: ways to train a computer to guess what you are going to do next.

His team, around 15 people, had been picked for their non-traditional skill sets. "I have artists, musicians, painters, researchers, engineers, people from Hollywood, CGI/FX supervisors from The Matrix. The idea being, let's have that diversity in thought, in creativity." Their brief, he says, was to "fail fast" -- and also to be stupid. "Incubation is a business of stupidity. If we don't feel stupid, we're not pushing ourselves hard enough."

Some of the challenges involved were more straightforward than others. Speech recognition, for instance, had the benefit of years of Microsoft innovation. But recognising individuals from their faces and body shapes was harder. "It turns out that in the living-room, Darwin is against you," Kipman says. "People in the same household tend to be very alike. I call it identity recognition. Facial recognition doesn't work."

But adding body tracking to recognising humans and understanding their speech? Suddenly the machine had to cope with those 1023 possibilities. That, Kipman knew, was way beyond standard programming abilities: "It's not going to happen in this lifetime that any number of infinite programmers in the world are going to sit in the room and code up all of these." So he needed to invent a new conceptual framework within Microsoft -- as he saw it, "to create this language where it's not about probabilities of what you know, but what you don't know".

Tamir Berliner met three of the four other PrimeSense founders in the same unit of the Israel Defence Forces. "The advantage of being in the Israeli army is you're trained at a very young age and they give you tasks that often seem unsolvable," Berliner, 29, explains. "You have to think creatively. So the fact that we faced

'unsolvable' problems at PrimeSense never scared us."

Launched in May 2005, the Tel Aviv based startup employs 130 staff and has recently opened a further five offices in Asia. The problem it set out to solve was how to use a depth sensor to map a moving person. "Our starting point, as gamers, was that computer games were getting boring," Berliner says. "We wanted more immersion -- so you could actually throw a fireball in the game. We wanted to take it to full-body games."

Beside his desk sits the first iteration of the resulting product: a white plastic box, around 30cm long and 10cm wide, containing an RGB camera, an infrared sensor and a light source.

That was first demonstrated in March 2006; two months later, Prime-

Sense took an upgraded version to E3 in Los Angeles. When Kipman first saw the tech, his heart raced. "He totally got it and understood what could be done with the technology," says Berliner, responsible for "software and vision". "He's a visionary."

The Xbox team contracted PrimeSense, amid secrecy, to provide Kinect's depth sensor chip and reference design. (Microsoft acquired another Israeli startup, 3DV Systems, which made 3D digital cameras.) Now all that was needed was to shrink the camera sensor and slash the price. "The hardware has to disappear in a way that makes it almost invisible to the user," explains Raghu Murthi, who runs the natural user interface (NUI) hardware group for Xbox 360, "so that the consumer is always interacting with the system, but not really aware of how."

At the start of 2008, Kudo Tsunoda joined Microsoft Game Studios from Electronic Arts to run Gears of War 2. Early one afternoon, during induction meetings with Tsunoda's new Redmond colleagues, Alex Kipman came for a scheduled appointment carrying hardware based on the PrimeSense camera sensor.

Tsunoda had never been one for conventional offices: instead, he had hung a curtain to demarcate himself a makeshift corridor space by the lifts. The tech that Kipman demonstrated behind that curtain was basic -- it showed a raw 3D view of the space, with no sound, nor RGB images -- but Tsunoda was transfixed. "There was such a burst, of, 'Wow, here's 100 things we can do with this,'" he recalls. "Creatively, it was very exciting." The two hit it off so well, their meeting overran by four hours.

Tsunoda was especially taken by the lack of controller. "Growing up seeing the Star Trek holodeck scenario, where technology enables you to have a fully immersive experience -- we talked about this as the first step towards that. Going into a room and, wow, suddenly being in a totally different environment."

Yet his excitement was tempered by a quick reality check. "You leave the meeting and think, 'Yeah, we're not going to be able to do that,'" he recalls. "You could stand there, maybe move your arms and legs and see those types of movements on screen, but it wasn't able to track you with any velocity or variety of motion." The pair knew they would need to incorporate microphones for voice interaction, and an RGB camera so that players could see themselves clearly on screen.

But bigger challenges remained...

Like Kipman, Tsunoda had a small incubation team who quickly prototyped around 70 game experiences. Soon some "awesome cool stuff" emerged. When Kipman's team prototyped a wireframe skeleton that moved with a player's body, his and Tsunoda's teams were transfixed. "You'd have people sitting for 40 minutes playing with this super-engaging skeleton", Tsunoda says. Human recognition, however, was a huge hurdle. "People change -- I used to have a giant beard; body weight changes. There was no system set up for that," says Tsunoda. And voice recognition proved a problem amid ambient noise.

But if these challenges could be cracked, he saw huge social benefits. "Most technology divides human beings," he says. "At a dinner party, ten people are interacting with their phones. I'm very focused, creatively, on using technology to bring people together -- to have more meaningful human interactions without that technology layer."

Growing up outside New York City, Tsunoda, now 36, recalls enjoying freeform play experiences that he says have largely disappeared. "I remember just going down to the park, making up in your head the games you'll play based on what's around you in a very unstructured way -- sports games, strategy games in the woods.

It's the stuff that's best trained me for today."

So what can we expect from Kinect in the future? "One of the super-interesting parts is being able to interact with a computer character in an interesting way. The human-recognition technology gives us that -- with Kinectimals, it's super-cool how that animal reacts to me differently from anyone else. Then there's voice. We have a good understanding of voice recognition, but a lot of communication has to do with intonation and body language.

That's the stuff we're working on to make for better interaction with computer generated characters."

More immediately, he is excited by a prototype that will let users "take a really complex physical object, hold it up to the sensor, and allow [them] to play with that object in a digital format. Which is supercool. This is just wave one. "And the full holodeck. Did we announce that yet? I think it's coming in 2012..."

Kipman, Tsunoda and their teams worked towards a deadline: a meeting of Microsoft executives on August 18 2008 when the project would be canned or greenlit for further development. "We brought into that meeting a set of sensors Scotch- taped together," Kipman recalls. "We didn't tell the execs anything about the technology. We didn't say anything about what had been done behind it. We just said, 'All you need is life experience. We're not going to tell you how this stuff works. Get in front of it and play it. It's a car. Drive!'"

Marc Whitten saw the meeting light up when the technology was demonstrated. "Everybody just smiled. The visceral reaction was incredibly powerful."

Kipman had bought time, but big questions remained. Straight away, Whitten recognised two things: "First, this was going to be really hard. We've talked about the nine miracles needed to make this happen." (He will not specify what they were.) "It was, 'Yay, we're down to eight.' And the second thing I thought was that this wasn't a game controller. It was much, much broader than that."

Thirty-two black pyramid cubes sit piled neatly on the window-ledge of Andrew Blake's office deep within Microsoft Research's Cambridge campus. The cubes -- awards for Microsoft patents, from "Gaze Manipulation" to "Patch-based Video Super-resolution" -- reflect the vast academic challenges overcome by Professor Blake, 54, since arriving here in 1999.

Blake, now deputy MD, is one of the world's foremost experts on computer vision, his research used in fields from heartbeat-tracking medical imaging to a "smart erase" background-removal tool for editing photos in Office 2010. Many of his breakthroughs use mathematical physics to map the probability that a particular pixel belongs to a heart as opposed to a kidney, or to a smudge that you want erased from a holiday photo. But training a computer to track a moving human body? That had proved a more insuperable problem.

In three decades Blake had helped design laser-rangefinders that could track military helicopters by watching the ground, and had helped an agricultural institute map cabbages on farmland for more accurate crop-spraying. But that was at a time when it could take an entire day for a computer to process a single image frame.

Representing a person's movements in real-time on a screen was a far-off dream. But in 2001, Blake published an academic paper with a fellow Microsoft researcher, Kentaro Toyama, that raised an intriguing new approach to following moving objects.

The paper, Probabilistic Tracking in a Metric Space, suggested that a ballet dancer's movements could be tracked by assigning a probabilistic likelihood that each frame would lead on to any other frame. By feeding a computer raw data of a ballerina's typical movements, each snapshot -- Blake called them "exemplars" -- could be automatically assigned its most likely next frame.

The approach -- "like having a flipbook of cartoons with prototypical poses", as Blake explains it -- sent a wave of excitement through the world of machine learning, and the paper won the prestigious Marr Prize. It also influenced a former post-doc at Oxford, Andrew Fitzgibbon, who specialised in computer vision.

Fitzgibbon's work on automatic camera tracking had spun out into a commercial product, boujou, which helps 3D animators to impose special effects on a live-action background and has been used in

Lord of The Rings and the Harry Potter films.

In 2007, Fitzgibbon, now 42, co-authored a paper, The Joint Manifold Model for Semi-supervised Multi-valued Regression, which again examined how probability could be used to infer a human body's motion. Fitzgibbon, now based at Microsoft Research down the corridor from Blake, recalls feeling "pretty chuffed" with the results. But the approach had one huge limitation: it could work only on a narrow range of about 50 body movements. And image processing was extremely slow: it took around a minute to generate each video frame of movement.

At this point, Kipman's team found a paper from Toshiba, whose researchers were also trying to solve the problem of real-time motion capture -- not with a 3D sensor, but with a regular camera. "They'd taken the work that Toyama and I did and scaled it up," says Blake. "So my flipbook, of around 100 pages was now much bigger and more comprehensive: their paper reported a flipbook of around 50,000 pages, looking for all the [movement] combinations of the human body."

Yet the Toshiba approach could never be comprehensive: as Blake recognised, the permutations of a body's movements were just too great to use probability mapping effectively. "If each body point was segmented into ten angles, there could be 1030 possibilities of the next movement. That just wouldn't work."

The answer, by sheer chance, came from the PhD research of Jamie Shotton, a computer-vision researcher in Blake's department who was fresh out of grad school. Shotton had shown how to train a computer to differentiate cows, grass or an aircraft in a countryside photograph by studying it pixel by pixel. He had taught the system to recognise 21 categories of object.

By chance, Kipman had come across one of Shotton's earlier papers about tracking hands. "It was really well written, and the tone was optimistic," Kipman recalls. "I'm like, 'We've got to go talk to this guy!' Not long after we found out he worked for Microsoft." The call from Redmond came on August 18 2008. Could Shotton help in tracking a body in real time? He discussed the challenge with Fitzgibbon. "My immediate reaction," Fitzgibbon says, "was, 'No you can't.'" Kipman's team put their Scotch-taped hardware into a bag, packed a couple of changes of clothes, and flew to Cambridge.

They had hacked together a video that took advantage of the camera sensor's 3D capabilities.

The video showed a man's skeleton being tracked in the foreground; in the background, it showed what the depth camera was seeing. Once the system had recognised the man's body shape, it snapped on to his skeleton; on screen you could see an outline of his movements as he danced. They called this demonstration the "Bones" video. "This was better than anything out there, but 'Bones' has three main failings," Shotton says. "You have to start in a particular position for it to lock on to you. If it stops working, you have to go back into that position. And it only really works for one body size. All assumptions about size and body shape are hard-coded into the algorithm."

The problem with the "Bones" body tracking approach was that bodies do not always move predictably, and are not all alike. So the system made frequent errors in estimating where a limb or a head was moving next. But what if Shotton and Fitzgibbon went back to Andrew Blake's "exemplars" work on ballerinas, whereby the system made frame-by-frame calculations of each body part's probable next move? If each frame was treated in isolation, the computer could be trained to predict the next frame.

But the data needed to train it would be vast. "We thought we could get away with 1013 examples -- but that wouldn't fit in the Xbox's memory," says Fitzgibbon. "Let's say your left hand has 1,000 possible combinations of its next move. For both arms simultaneously would be a million combinations! It was an exponential complexity."

They brainstormed further about how to divide up the body.

Eventually Shotton suggested revisiting his PhD research -- involving cows, grass and aircraft -- and adapting the same pixel-by-pixel classification to the human body. This was the breakthrough moment: if the machine-learning system could be trained to recognise individual body parts -- and the depth sensor could help -- then the "Bones" system would initialise regardless of your pose.

Fitzgibbon plays a demonstration video in which each body part -- from the upper-left arm to the right foot -- has been colour coded, and the colours follow the body as it moves. They call this the "Harlequin" video: the human figure resembles a clown wearing a harlequin suit, with a different colour for each of 31 body parts plus one for the background. "The machine-learning algorithm has been told this pixel here is a shoulder pixel, while this pixel in another image is also a shoulder pixel despite looking quite different," Shotton says. "If you can get enough variation in the millions of poses it's been trained with, you can effectively teach the algorithm to ignore the fact that you've changed pose, or got a bit fatter, and keep the important information that this is a shoulder pixel." (Finger movements, for now, can't be resolved because of current limits of camera resolution.) "This was the missing piece," says Blake. "Seeing the harlequin figure moving on the screen, with the harlequin pattern sticking to the body, solved the two problems -- what happens when you come into the room, plus rapid movement."

They named this the "Exemplar" system, and began to train the algorithm with 100 computer-generated images of people. By June 2009, it was ready for its first public demonstration at E3 in Los Angeles. "Up to that point, I hadn't told my family what we were working on," Blake recalls. "Not even my wife knew." The team's final handover to Redmond came six months later.

Shotton did not get to play any of the launch games until August 2010. "I was amazed at what they'd done with this," he says. So what does he see as potential future applications? "Anything where motion is important, such as physiotherapy, or bowling practice," he says. "And in massively multiplayer online games -- you can imagine your full body mapped out and interacting intuitively in the game." "Or imagine a surgeon with hands-free control of devices in an operating theatre," says Fitzgibbon. "Or clothing -- in a movie,

[actors] will wear £50 Savile Row suits, as a 3D camera gives you a perfect fit."

For Blake, the bigger excitement lies in what this project means about the future of man-machine interaction. "It's as radical as the mouse or the multitouch screen," he says. "This is the no-touch screen."

Cambridge had given Kipman his new "language". The project sped ahead, moving from incubation to the software-platform group under Ben Kilgore. "We broke everything down into three or four core problems to solve," Kilgore says. "Within each were 20 subproblems.

We had this mantra of, 'We must fail fast.'" For voice recognition alone, the tasks were vast. "We built an acoustic model for each country with speech support," Kilgore says. "You have to build a statistical sample of people from different regions and dialects.

We went to the deep south, and we'd have people with accents run through manuscripts. Then we'd go to New England, Boston, Minnesota."

Rigorous testing was vital -- of hardware, software and the games that were starting to emerge. Hardware testing involved gamers playing in a variety of temperatures, humidities and light conditions. "We have huge ovens where we keep the product at high and low temperatures to simulate different user conditions," Raghu Murthi says. "We have anechoic chambers, where there is no external noise coming in, and we test microphone arrays and audio systems."

In June 2009, when Project Natal was revealed at E3 in Los Angeles, Peter Molyneux of Lionhead Games showed a video of an in-screen character called Milo talking back intelligently to Kate, the real-life woman chatting to him through the sensor unit; Milo even reads her facial expressions to detect her mood, although the demo prompted some online scepticism about the veracity of its real-time action. This July, Molyneux again presented Milo to the audience at TEDGlobal in Oxford as "a revolution in storytelling". "Entertainment, film, TV, even the hallowed book, are just rubbish," he declared. "Why? Because they don't involve me, the audience. I hate that. I love the future that Milo brings." No Milo game has yet been scheduled for release.

Deep within a landscaped campus set amid 36 hectares of Warwickshire parkland, past an unmarked security entrance off an unnoticed country lane, Kinect is being tested to its limits. It's the last days of bug-fixing for Rare's Kinect Sports, and in a series of custom-built development barns set around central offices, dozens of the company's 200 staff and contractors are sweating, quite literally, to ensure the ten featured sports, from bowling to boxing, are ready for show time on November 4 in the US and six days later in Europe.

In a downstairs test room, two women in full gym kit are playing table tennis by frantically waving their hands as another woman boxes energetically into the air. Upstairs, where developers sit with artists, coders and programme managers, the view down the corridor is surreal: as one man runs on the spot outside his office door, another shouts "Goal kick!"and a third is visible initially only as a protruding posterior until it becomes apparent that he has been throwing an imaginary javelin. "Here's where I suffered my first Kinect injury," says Nick Burton, Rare's development director for Kinect, as he points to a low beam over a doorway. His colleague George Andreas, Rare's creative director, explains that these barns contain the highest concentration of Kinect cameras in Britain.

Nearby, games are being tested on vinyl flooring, wood strips and carpet tiles (to emulate the varying reflectiveness of users' homes) and also, loudly, in Spanish. The office decorations serve as a reminder that Rare, bought by Microsoft Game Studios in 2002, has long been a giant of gaming: wall posters for

Banjo-Kazooie and Conker, display cases of original Donkey Kong games for the Super NES and N64, large promotional statues for Star Fox Adventures.

Tsunoda approached Andreas in August 2008. "We'd heard on the corporate grapevine of a new motion-capture system that didn't require you to wear a suit with ping pong balls, which would work in your living room," says Burton. "I thought, 'Yeah, right.'"

Sceptical, they delayed a visit to Redmond until October. Even on the plane over, Andreas dismissed the idea as "pie in the sky". The four Rare executives on the initial trip met Kipman, Tsunoda and their teams. "Alex's office was a den of wires and boards, like a mad professor's lab, with one of the old PrimeSense units gaffer-taped to the TV," says Burton, 39. Then he saw a demonstration of the "Bones" skeleton on screen mimicking his movements, and he was awestruck. "That was my Natal moment. At the time the skeleton didn't automatically lock on to you -- you had to walk into it. But I was amazed. I sat in Kudo's office and refused to leave until he gave us some developers' kits."

They returned and began prototyping possible games. "Some of the concepts were really off the wall," says Andreas, 39, at Rare for 15 years. "[Such as,] 'Wouldn't it be great if the rug in your living-room suddenly sprang to life where you're riding a raft with sharks swimming around?'" Burton continues: "There was the mad dancing Saturday Night Fever idea, where different moves would pop a huge glitter ball out of your ceiling."

But with Andreas having once played semi-professionally for Crystal Palace, a football game inevitable. "We knew we could kick off football in 3D," he says. "But one of the hardest things in real football is juggling a ball. I thought that would prove the technology. So we started working on a simple prototype -- and it worked."

Over Christmas 2008, they narrowed their focus on three ideas, sport being one. "We wanted to compete head-on with Wii Sports," Andreas admits. "We knew we could do so much more with Kinect than you could with Wii."

Because the launch titles were aimed at the family market, Rare chose the more popular sports. "We put a bowling prototype together in three days: can we significantly improve on Wii Sports' bowling? Could we allow people to run at the screen with the ball?

We realised we could improve on it."

The team prototyped almost 20 sports, from horseracing to cycling, whittling these down to six (with five events within track and field). Football went through the most iterations: "I probably inhibited the creative process," Andreas admits. "We created real football, but, boy, was it hard to play. My mantra was that it had to be football for the masses."

Software drops were arriving daily from Redmond. There were glitches as the team moved from working on PCs to Xbox -- with football, leg tracking proved unreliable and latency increased.

That, Andreas says, was the low point. These issues took three months to resolve. "It's amazing," he reflects. "It's not even

[been] two years, yet we've got new technology, new software, new hardware, new ways of talking about games."

What's next, then? "The initial wave of experiences are those people can relate to," Andreas says."After that, that's when we can really go to town. You can create any experience on Kinect -- shooting games, fighting games, adventure games; it's how you tackle that subject. A lot more can be done with face recognition, with voice recognition." "The clever thing about Kinect is they're all married together,"

Burton says. "From an input point of view, it becomes more than the sum of its parts. If we wanted speech recognition to know it was me speaking -- that's hard. But if Kinect is tracking me, looking at the skeleton, it can see if that movement is where the sound is coming from. It can then look at the RGB feed to see if the RGB is moving around that person's mouth." "It knows Nick is speaking with just five lines of code,"

Andreas says. What of the expectation, raised in the Milo demos, that players might have full conversations with on-screen characters? "Realistically, if you want a one-to- one conversation with an AI-driven character, that's at least five years away,"

Andreas says. "But thinking about where that tech is going," Burton adds, "if it's cloud-based, passing the Turing test becomes much easier. You just need a huge internet-like database, then work out how to data mine that. "On November 4, that's not where it stops. That's just the beginning."

David Rowan is the editor of Wired*. He wrote about the app explosion in 02.10. Kinect launches on November 10 in Europe*

Additional reporting: Terrence Russell

This article was originally published by WIRED UK