Robot See, Robot Kill

Scientists are working on a camera that automatically tracks people as they move and focuses on the loudest person in a group. It was funded by the military, which wants to develop robot sentinels capable of automatically returning fire when attacked. By Jenn Shreve.

Every second of every day, your brain evaluates raw information from your five senses and causes you to react, often involuntarily.

A self-aiming camera being developed by scientists at the University of Illinois at Urbana-Champaign is learning to respond to audio-visual stimulation in the same way.

The camera is able to detect movement and sound, compute the probability that what it's sensing is worth responding to and then turns (or doesn't turn) toward the stimulus accordingly.

"It does a very good job of picking out targets that are interesting," said Dr. Tom Anastasio, a neuroscientist at the University of Illinois and director of the self-aiming camera project.

If, for example, there are three people standing in front of it and two of them are shaking their heads while the third is shaking his head and saying something, the camera will focus on the person who's moving and making noise.

The camera was originally developed to auto-focus on speakers during a video conference call or a college lecture. Instead of hiring a camera operator to zoom in on different speakers, the camera would be able do the job automatically.

The research is funded by the Office of Naval Research, which is interested in developing "robotic sentinels," as Dr. Joel Davis, program officer at the ONR, put it.

In defense scenarios, a battery of cameras could be used to detect suspicious activities around ships and military bases. They may even be attached to guns that would automatically return fire if attacked.

"The camera could pick up a muzzle flash and a sound of a gun firing, and it would autonomously direct counter-fire," Davis said.

The self-aiming camera is based on a neural network, a complex computer program that simulates a biological nervous system.

The neural net mimics an area of the brain called the Superior Colliculus. Located in the midbrain of mammals, the Superior Colliculus is very old and present in one form or another in all vertebrates, from humans down through fish.

Davis described the Superior Colliculus as the place "where information from the eyes and ears comes together for the first time as it goes to the brain."

Neurons in the Superior Colliculus receive sensory input -- a sound in the bushes, an unusual smell or a rapidly approaching car -- and initiate physical movement in the direction of the sensation.

The researchers built a model of attention based on study of the Superior Colliculus. Sensory inputs are scored depending on their strengths, and the system computes, or "decides," how strong a response is needed. A weak sound may not attract the camera's attention, but a weak sound paired with a slight movement might, Anastasio said.

"A loud sound might be enough to make you turn," Anastasio explained. "A soft sound might not. But what if you paired a soft sound with some visual movement? That might be enough to make you turn."

The camera's neural net was trained with a variety of objects that move or made sound. Researchers placed a moving, noise-making object in front of the camera, which is equipped with microphones, and told the computer its exact location. Once it had learned how to follow objects, the computer was trained to choose between stimuli.

Today, if several people were to have an argument in front of the self-aiming camera, it would focus on the person with the loudest voice and most boisterous gestures, Anastasio said.

Anastasio said his team is now looking into incorporating other kinds of sensory input -- radar, infrared, heat or sonar -- into its decision-making process. Ultimately, Anastasio hopes the camera will be able to learn on its own.

"Nobody taught you to look at those noises and conjunctions of stimuli in the environment," he said. "It should be possible to get the camera to do that, too. Then we could put it where a person can't go and can't pre-specify what a camera should look at, such as inside a volcano. It will learn by itself where the richest sensory information sources are and look there itself."

Similar work is being carried out at MIT's Artificial Intelligence Laboratory.