Ivo Welch, June 2008

A Short Test Review of Omnipage 16 Pro

I needed an OCR engine to process a whole lot of pages. My plan was to make one computer a "backend" processor for a variety of scanned documents. I would send a scanned document to this computer, and I would receive a text document back. This would then be fed to other programs.

The main choices in this market are Omnipage, Abby FineReader, ReadIris, and tessaract (a free google program, that is unfortunately not yet ready for prime time). Nuance, the manufacturer of Omnipage charges $600 for Omnipage 16 Pro, and $200 for an update. Although I owned the 15 version of Omnipage, there are full legal Omnipage 16 Pro copies for sale on ebay for about the same price. So, I decided to just buy another ebay copy, rather than update my own Omnipage 15 Pro copy.

Now, I know from prior versions and a little experimentation that the Omnipage recognition engine is highly accurate. As important to me, Omnipage has a very nice XML text output format that makes it easy for later computer programs to process what Omnipage has delivered. (The XML format itself is undocumented—but it is great.)

I had good reasons to be excited about this update—compared to Omnipage 15 Pro, Omnipage 16 Pro was supposed to be both more accurate and faster. I installed my brand new copy on an AMD64x2 machine with 3GB of RAM and 200GB of free disk space. The installation was painless: I first uninstalled the 15 Pro version, and then installed the 16 Pro version.

The first sign of a problem started when Omnipage did: It took 5 minutes to start up (and not just when it first starts up). Literally 5 minutes. I would sit there for 300 seconds and wait for Omnipage to get beyond its splash screen. If I close the program and immediately rerun it (so that it still sits in RAM), this time goes down to about 1 minute. Naturally, I wondered whether I had run into an installation error.

I headed over to the Nuance web site. Unfortunately, there is no forum (they call it a community bulletin board) for OmniPage. This is bad, because the technical documentation for Omnipage has always been very poor. (Nuance supplies a couple of end-user documentation pages, but nothing that seems particularly useful.) There are no power-user sources of information, other than their paid tech support. Fortunately, this support is fairly affordable and paid per incident.

So I paid my dues, filed a Nuance incident report (with some additional information, explained soon), and waited for 2 days. The tech rep was nice and friendly. However, he told me that a 5 minute startup is normal for Omnipage 16 Pro.

Because I was already filing a tech support questions, I figured I may as well report a second problem in the same incident: On my first trial document—a 200 page scanned pdf file—I received an "out of thread memory" error after a few minutes of processing.

To my surprise, Nuance tech support informed me that "The Omnipage software may not work on that heavy load of about 100 pages or more." I found this difficult to understand: a $600 professional OCR program cannot reliably process documents of 100 pages or more. This was not in their advertising brochure.

The last straw came when I realized that Omnipage is not smart enough to realize when a page is scanned side-ways or upside down. Let me repeat: OP is an OCR package in 2008 that can presumably partition pages, but that is not smart enough to recognize upside-down pages. For me, despite its Pro designation, unless all pages are scanned the right way (and mine are not always), it has no ability to automate OCR processing!

I must mention another shortcoming: If you have a quad-core system, and you want to process multiple documents simultaneously, Omnipage will not allow you to do so: it can only run one batch manager per machine at a time. So, despite touting their multi-core ability as an important upgrade, they have made automated use of more than two cores impossible. Even if you own that brand-new four-core or eight-core DELL system, it won't help you.

Nuance has been the market leader in OCR for a long time. However, they have seem to lost their bearings. Other products are more userfriendly, start up within 10 seconds, can reliably process documents that are more than 100 pages long, can autorotate pages, and can use more than two processor cores. (Abby can even distribute tasks across a network!)

For curiosity, I decided to look up Nuance's annual financial statement. They describe themselves as a maker of speech recognition software. I guess this explains why their OCR software has basically just become a cash cow to milk the ignorant. A pity...

(If you know someone working for Nuance's corporate department, please relate my experiences to them. I wonder if they are even aware of where this OCR business of their's is heading.)

(PS: The only ray of light was that I stumbled upon a Nuance developer, who was nice enough and cared enough to help me with some questions along the way.)


Omnipage 16 Pro Wrapup

Plus: Good Recognition Engine. Very useful undocumented .xml output file format. Good right-click implementation in Windows Explorer.

Minus: Clumsy full interface. Not able to process >100 pages reliably. Slow startup time. Not able to auto-rotate pages. No tech documentation. No community support forum.

2 stars out of 5.