Read All About It

Readers can now sift through back issues of the venerable thanks to an ambitious project to digitize all the news that's fit to print. By Kendra Mayfield.

"Titanic Sinks Four Hours After Hitting Iceberg."

With that banner headline on April 16, 1912, The New York Times informed the world about the 20th century's most epic disaster.

Now, the venerable Gray Lady has digitally preserved its account of the Titanic sinking for future generations to see, exactly as it appeared to readers some 90 years ago.

The Times has preserved everything else, too.

A company called ProQuest has digitized every back issue of the Times, from cover to cover. Every news article, editorial, photograph, cartoon and advertisement is included, and using a fully searchable file, readers can see articles as they originally appeared in print.

From the attack on Fort Sumter to Nixon's resignation, readers can trace watershed historical events from 1851 to 1999.

The Times is the first paper to be fully digitized in ProQuest's Historical Newspapers project, which will electronically convert complete back runs of leading newspapers, including The Wall Street Journal, The Washington Post and The * Christian Science Monitor.*

With over 3 million pages, over 25 million articles covering 148 years of history, and 4 terabytes of data, the Times conversion effort is unprecedented.

"It's extremely important," said Carolyn Dyer, senior vice president for ProQuest Information and Learning. "(The New York Times) really is the newspaper of record nationally. It's one of the most-used news resources in any college or university library."

"Our primary goal has more to do with accessing the historical record than with maintaining it," said Peter Simmons, director of The New York Times Agency.

"Historical Newspapers enhances access through the ease of digital delivery as well as through sophisticated searching of the ASCII text that resides in the background of the database. This combination enables a broad spectrum of users to explore the seemingly limitless resource of The Times' historical files on their own terms."

Articles are preserved in full-page format so that readers can see where an article or ad was placed on a page, either above or below the fold.

This format "lets a user interpret the article objectively, against relevant news of the day," Simmons said.

With multiple articles of varying lengths and page jumps, newspapers pose unique challenges for electronic conversion.

ProQuest used microfilm as the source for digital images, some of which was originally mastered over 50 years ago, when even microfilm technologies were not advanced -- resulting in varying degrees of quality.

Since newspaper formats evolved over 150 years, some of the earliest and most difficult newspapers to digitize featured small, almost illegible type, with columns that nearly ran together.

Early Times editions weren't microfilmed in 1852, when they were "hot off the presses," Dyer said. Instead, these issues were microfilmed in the 1930s and 1940s -- more than half a century after publication -- so pulp had already begun to deteriorate by the time archivists captured images.

ProQuest used new digitization, zoning and image-enhancement techniques to revitalize these old editions. The company developed software to facilitate zoning and editing of optical character recognition text for 99.5 percent accuracy in the headline, byline and first paragraph.

Archive specialists manually checked every page image when scanning the microfilm, using image-cleansing software to create the highest possible quality.

ProQuest also developed software for search and retrieval of full-page images and individual article images. This software enables users to search by keyword or browse a full page, select an article and click through to obtain a printable image.

"This opens up these newspapers to information that (readers) couldn't look at by looking at The New York Times index," Dyer said.

"The images are awesome, definitely better than microfilm, which is the only alternative for full text access to the older Times articles," said Mary Ellen Bates, an information services consultant. "Browsing is easy, and the user interface is good."

Digitized editions of copyrighted materials will be available to library and education customers on a subscription basis, through ProQuest's website. Out-of-copyright materials will be available to customers for purchase.

"You can't get the ads or graphics from any of the online services or on the Web, and that information can tell you volumes about social, cultural and political trends," Bates said. "That's the point of using these value-added online services: They have deep, rich content that the publisher is not giving away for free on the Web."

However, copyright disputes have raised thorny issues over electronic archives.

Recently, the Supreme Court ruled that publishers violated freelance authors' copyrights by putting their articles in electronic databases without their permission or further compensation.

The Tasini et al v. The New York Times et al decision forced news librarians to delete thousands of free-lance articles that did not comply with the ruling.

To comply with copyright requirements, the system does not permit free-lance articles to be viewed individually apart from the publication.

ProQuest's directors insist that this mechanism won't poke holes in the historical record, since readers can still view full-page versions of complete issues without omissions, as they originally appeared.

"It's still preserved in the historical record in the full-page version," Dyer said.

It's unclear whether digital archiving is the best way to preserve the nation's historical records, since the technology is costly, relatively new and untested.

Although microfilm is one of the only preservation mediums proven to be effective for extended periods, advocates are still betting that digital preservation will withstand the test of time.

"What becomes obsolete is the storage medium or the access software," Dyer said. "As long as a responsible organization is in charge of the content, it can always be upgraded to the next medium of preference."

"The dual approach of digitization and microfilm is optimal right now," Simmons said. "Microfilm still has value and use, while digitization provides wide access via the newer platform of the Internet.

"Rather than a future-proof solution, we prefer to remain platform-agnostic and continue to extend our brand past, present and future, across each new technology that is appropriate and efficient."