Google Begins Indexing Scanned Documents

Google has begun to index documents posted online that contain images of text using Optical Character Recognition (OCR) technology, it announced yesterday on its blog. Previously only docs converted to PDFs with text were indexed and included in results. Since scanned docs are only a picture of text, they are typically more difficult to interpret, […]

Google has begun to index documents posted online that contain images of text using Optical Character Recognition (OCR) technology, it announced yesterday on its blog.

Previously only docs converted to PDFs with text were indexed and included in results. Since scanned docs are only a picture of text, they are typically more difficult to interpret, and the pages can include wrinkle, smudges or stains.

This advancement opens up a whole new collection of information, including many government and academic documents once hidden from the public searches.

The news comes a few days after Google settled its book-scan suit, giving it the go-ahead to continue its book search project.