Crucial Tech: Character Recognition

Armed with a preclassification system based on an algorithm tantamount to natural selection, Silicon Biology believes it holds the key to better optical character recognition of handwriting and other written forms.

Humans have a hard enough time detecting their own handwriting � imagine trying to make software smart enough to comprehend the penmanship of every sloppy writer on the planet, and you see the challenge optical character-recognition software developers have had for the past 30 years.

But the suburban-Minneapolis company Silicon Biology believes that it has a far more accurate OCR program than its competitors, which rely on technology the firm considers fundamentally flawed. Dubbed Fermat, Silicon Biology's program uses a preclassification system based on a genetic algorithm akin to natural selection. In contrast, other OCR programs use a neural network based on the theories of the late Russian mathematician Andrey Kolmogorov. The neural model studies the shape and slope of handwriting in determining content, while Fermat assesses the approximately 20,000 ways a human could write a letter of the alphabet or a number.

But does Fermat really have other OCR programs beat? Yes, says Tony McKinley, a consultant with Pennsylvania-based Intelligent Imaging, who tested Fermat against 50 competitors. "It's not 100 percent accurate, but it outperformed other OCR systems by a factor of 50 percent or better."

After a six-year struggle to get the firm off the ground, Silicon Biology founder Eric Anderholm and his staff of 30 have begun to carve out a slice of the US$15 billion form-processing industry, attracting a handful of clients, HMOs and insurance companies among them. But data forms may not be the only area the company applies its expertise. CEO Doug Johnson says that the technology can also be applied to classifying spoken words, Asian-language characters, and white blood cells (a process now performed by the naked eye and a microscope).