In an age when images of SCUD missiles screaming toward oblivion have become our core cultural texts, it's clear that the archiving system of Dr. Melvil Dewey just won't cut it. Though the student librarian's decimal system, invented in 1873, proved functional for literary work, it collapses before the task of classifying the evening news. How do you shelve a copy of the Simpson verdict?
In response, research scientists from universities, IBM, and Xerox came together Thursday at the Digital Libraries '97 conference to showcase methods of extracting critical metadata from video archives - everything from featured guest stars to camera angles.
In one of the most ambitious projects, researchers at Carnegie Mellon University�s Informedia Digital Media Library Project believe they have found one shortcut by compacting hour-long videos into MTV-like "skims." Using algorithms to identify info-rich images and audio, their system creates a flashy, searchable abbreviation of video footage.
"A movie studio gives you a one-minute trailer, but they're not trying to tell you the story," says CMU researcher Michael Christel, who presented his work Thursday. "We'd like to come up with a 10-minute video for 100 minutes of footage - not just a marketing preview but a 'skim' for information."
The "Informedia" project is just one of six seed projects funded three years ago by the National Science Foundation, NASA, and DARPA in their "Digital Libraries Initiative." While other universities like Stanford and Berkeley work to develop geological archives and environmental data, the CMU team toils to automate the "skimming" process using pop culture footage: close to 500 hours of video from CNN News, PBS documentaries, and the British Open University course catalog (a free-to-use video-correspondence school).
To create the skims, users first choose the degree of distillation of the video - the "compaction." CMU researcher Michael Smith says the Informedia system can compact video 20 to 1 (a 60-minute video becomes a 3-minute skim), but at that level, the clip is no longer coherent. "At a certain ... empirical cutoff, you lose too much," says Smith. "Even a professional producer couldn't go through the video ... and convey the content."
The trick, says Smith, was learning to identify subtle filmmaking conventions that signal relevant information. The group discovered that video producers often use camera motion simply to blend into something important. "When the camera pans across a polar bear, it stops on the polar bear head," notes Christel. The team then developed an algorithm (in conjunction with the University Robotics Lab) to pinpoint the changes in camera position - a process called "optical flow analysis" - which allowed them to isolate important images.
The system then scans the audio track for information-rich words using a technology called TF-IDF Waiting (Term Frequency-Inverse Document Frequency). TF-IDF measures the frequency with which a word appears in the video compared with a standard list. Words with high scores on the scale ("the," "and") are ignored while terms with low scores are identified as highly-relevant. In a clip about an earthquake, Smith explains, the system would tag "tremor," "geology," and "earthquake." The dense video sequences and audio track are then strung together in a makeshift montage.
But the application has some serious drawbacks. Because of inconsistencies on audio tracks, the system depends on closed-captioning text or a perfect digital transcript for the TF-IDF formula to work. Additionally, the system can't make simple connections between voices to identify who is speaking. While humans make quick work of matching voices to names, says Smith, that kind of complexity boggles the Informedia system.
While companies like Perspecta and Thinking Pictures have worked to develop metainformation systems for film companies, the technology is still in the development stage, says Thinking Pictures CEO Gordon Gould. The Informedia project leaders, meanwhile, aren't expecting to take their work public. "We're not looking at being a service provider," says Christel. "We just do the research ... [and] we're crossing our fingers."
From the Wired News New York Bureau at FEED magazine.