Software Scours Holocaust Records

Researchers are developing speech-recognition technology to sift through thousands of hours of personal video accounts from Holocaust survivors and witnesses in different languages. Dialects and accents provide a big challenge. By Kendra Mayfield.

The voices are muddled with thick, emotional accents, revealing both tragic and heroic eyewitness accounts from thousands of Holocaust survivors and witnesses.

But while these videotaped tracks are imperfect, they are invaluable to historians and generations to come.

Researchers from Johns Hopkins University, IBM and the University of Maryland are developing speech recognition software to allow historians and scholars to search through more than 51,000 video interviews from Holocaust survivors, witnesses and liberators.

The interviews will be garnered from the Survivors of the Shoah Visual History Foundation, which contains the world's largest coherent archive of videotaped oral histories with 116,000 hours of digitized interviews in 32 languages from 57 countries.

The National Science Foundation recently awarded $7.5 million, distributed over five years, to help fund the project and develop a new system that's capable of recognizing key words and phrases in new languages.

"This is one of the few projects tackling so many things at once on such a scale," said Bill Byrne, associate research professor for the Center for Language and Speech at Johns Hopkins. "This is a real application we're trying to solve."

"Our original mission to collect 50,000 testimonies is now complete," said Doug Greenberg, president and CEO of the Shoah Foundation. "Our mission now is to use the archive in educational settings to overcome prejudice and bigotry."

The Sept. 11 attacks make the Shoah Foundation's mission to unlock its archive even more essential, Greenberg said.

"Sept. 11 was about a lot of things but it was also about hatred," Greenberg said. "We have 50,000 educators whose testimony really speaks to the evil that hatred in the world makes."

Researchers have already begun manually reviewing English language tapes and indexing them according to times, places and incidents described in each interview.

But the time it takes to manually index, summarize, research and review a collection this size is daunting: Just one single testimony consumes an average of 35 hours. So far, it has cost the Shoah Foundation approximately $8 million to catalog just 4,000 of these interviews.

Armed with the NSF grant, researchers hope to advance speech recognition technology and create an "audio search engine" that will lower costs and speed the cataloging process.

The research team is initially building speech recognition systems to process interviews conducted in Czech. They will then explore opportunities to develop systems in other Central European languages.

One of the trickiest challenges is getting a computer to understand different dialects and languages, said Sam Gustman, executive director of technology for the Shoah Foundation.

Fully automatic technology for accessing archives is currently inadequate, researchers say.

Most commercial speech recognition systems are designed for broadcast news. The technology is most reliable when it is asked to recognize a limited number of words or phrases that are spoken slowly or clearly.

But the Shoah collection poses unique challenges. Many of the interviews are conducted in Central and Eastern European languages that don't have speech recognition systems. Often a speaker will switch between different languages, alternating from English to Yiddish in mid-sentence.

"As you get into other languages, the technology isn't there yet," Byrne said.

"We don't have the technology to do large-scale search yet (on recorded conversations)," said Douglas Oard, an assistant professor at the University of Maryland. "If we had the capability to search, it would change the way we do things."

Many of the interviews are difficult to understand because speakers have heavy accents or are highly emotional when they recount their experiences.

"These people are speaking about things that had great impact in their lives," Oard said. "This makes speech recognition that's designed for broadcast news fail."

Gustman agreed: "We're not able to process this stuff in English, let alone other languages."

Rather than transcribing interviews word-for-word, the software will identify key search terms and phrases.

"We don't edit any of these interviews," Greenberg said. "It's completely raw footage taken directly from interviews with survivors. It will be broadly accessible, but it won't be edited."

"It isn't as good as a human cataloging, but it's $100 million cheaper," Oard said. "We're going to drive costs of doing this down to a point where applications are possible that are now infeasible."

The potential applications for a Web search engine based on conversational speech are endless. The technology could be used to scour oral histories for projects that might otherwise be financially infeasible, from civil rights to the space program.

"There's a lot more oral history than anybody even knows about," Oard said.

The technology could eventually be applied to other recorded conversations, such as speech-enabled cell phones.

"When you develop this type of technology, you open a lot of doors," Oard said.