OCR is much more than it used to be. It has evolved to the place where some software can pluck key indicators out of just about any invoice, in just about any language.
By Tom Seibold, Senior Content Strategist, Perceptive Software
Today, software for
optical character recognition (OCR) is commonly included with low-cost scanners and multifunction devices—making this once-rare technology for converting printed letterforms to characters a computer can process seem almost pedestrian.
But beyond the inherently impressive feat of ordinary computers reading printed text lies neural network-based software that can actually
apply logic and meaning to the text it reads.
Such advanced OCR software today can intelligently extract data from scanned pages, picking out—for example—purchase order numbers from dozens of different invoices that make their way into an AP processing center.
In the Beginning
The highly evolved OCR of today builds on many previous generations of the technology.
During the half century or so that OCR has been put to work in the back office, there have been distinct limitations on the technology’s capabilities.
In the early days of OCR, human
assistance was always required to assign meaning to the characters the technology read, except in the most highly structured scanning situations. But even though OCR did little more than straight conversion, the technology was a boon for certain business automation tasks.
OCR undoubtedly dazzled early adopters and technology observers with the promise it held for automating an even wider variety of paperwork-processing tasks. But in the era of TWA and NASA, computers and scanners were costly devices and even the most basic OCR was still limited to big corporations or military organizations.
AP by the Foot
If there has ever been a time-consuming and error-prone chore of office paperwork that affects businesses of all sizes, seemingly consuming all the resources thrown at it, it’s
paying bills.
In his history of
Ford Motor Company,
Wheels for the World, Douglas G. Brinkley describes the chaos of Ford’s accounts payable office during the company’s explosive growth in the 1920s.
Though Henry Ford perfected the assembly line that cranked out twenty Model Ts per hour, the brilliant industrialist was helpless to shrink the mountains of paperwork required to keep his factories running. Brinkley says that Ford accountants were so desperate to get a handle on cash flow that they estimated the value of the stacked-up towers of invoices around them by calculating an average dollar amount per vertical foot!
In the modern era, computers have become a valuable tool for generating business-to-business payments. But the diversity and complexity of invoice formats for companies with many (and constantly changing) suppliers has long stymied efforts to automate the accounts payable process. Even the introduction of computerized scanning, while reducing the swelling tide of physical paper, did not reduce the need for human reading, interpretation, and keying of values on a page before
paymentcould be issued.
So while automation had infused many office processes with Jetsons-like speed and efficiency, accounts payable was for a long time seemingly abandoned on the information superhighway, trundling down the road like one of Ford’s original Tin Lizzies.
Reading Skills
In the 1970s, with the advent of relatively inexpensive personal computers and the invention of the charge-coupled-device (CCD) for use in affordable scanners, the stage was set for the next OCR revolution. It came in the form of software—created by author/ scientist/ inventor/futurist, and artificial intelligence prodigy
Ray Kurzweil—that could for the first time recognize any font. While the innovation was originally combined with text-to-speech synthesis that enabled computers to read printed matter to the blind, the “omni-font” capabilities of this technology soon made possible the widespread conversion of printed pages to computer text.
This advancement allowed the extraction of raw character or ASCII data from a page, but the ability to turn that raw text into something meaningful or useful for business processes was still in its infancy.
Baby Steps
Software that could meaningfully read forms using omni-font OCR technology began to toddle around in the early 1980s, according to Charles Kaplan, vice president of marketing at Ashburn, VA.-based Brainware, Inc., a maker of “intelligent data capture” (IDC) software, that is being acquired by Perceptive Software.
Such first-generation OCR-driven data capture programs were template-based, meaning that data elements had to be contained in specific locations on paper forms to successfully be read, saved, and associated with tags such as name or employee ID number in a database.
Some first-generation programs could even read a designated area on each page for a form number, allowing the application to distinguish between, say, a 1040 or 1040EZ tax form and apply the appropriate template to each.
While a useful advance over full-page text capture, template-based recognition still faced obstacles. First, a template had to be manually developed for every variation of every business form. Second, the inherently imperfect nature of optical scanning meant that glitches in the size or positioning of forms, or of the data in specific fields, could cause read errors or simply kick such reads out to an exception queue for time-consuming manual processing.
“First-generation OCR data capture was a useful technology in settings with highly structured form types and ideal scanning conditions,” says Kaplan.
It was not a bad start—but it was still a long way from the holy grail of intelligently reading and interpreting business forms in a wide variety of formats without human assistance.
Free-Form Living
As the 1980s progressed, “free-form recognition” products hit the market, overcoming some of the limitations of template-based techniques.
Rather than using “X-Y” coordinates and requiring text data in fixed positions, free-form products collected all the text on a page, like pre-template-based products. After reading and interpreting the letterforms of all the text strings it finds on a page, it would analyze that text against “anchor words,” like “Invoice number” or just “Inv.” While capable of storing such manually added phrases into its library of recognized terms, it would handle additional languages only if more anchor words were added.
“It’s very rules-based,” says Kaplan, “and anything that doesn’t fit a rule doesn’t get saved.” He adds that OCR errors readily disrupt the interpretation process. For example, should it read the “I” in “Invoice” as an upper-case “i”, a lower-case “L”, or the numeral “1”?
More advanced second-generation products combined keyword methods with the ability to process the same form several times, allowing a program to “learn” and build its own library of templates stored in an internal knowledge base, speeding future recognition. But even with these “Memory Recognition”-class products, the ability to process a stack of varying documents and intelligently extract the key elements needed for applications like accounts payable with a high degree of accuracy was still not possible.
A clean slate for data capture
According to Kaplan, the company that became Brainware began when several OCR/data capture industry veterans recognized the deficiencies of traditional OCR-based data extraction methods and decided to start over with a clean slate.
Leveraging their past industry experience, this group of experts decided to abandon first and second-generation approaches and create their own form of intelligent data capture whose core technology or “engine” could process any type of document in any language. They envisioned a solution that could sort scanned and electronic documents based on their content and meaning, extract critical data regardless of layout, and validate it whenever possible. This system would be capable of “learning” document types and how to process them much as a human does, without time-consuming manual template creation.
Tapping a diverse group of scientists, researchers, and developers that currently includes a physicist, mathematician, biologist, chaos theorist, and guidance systems designer, Brainware has developed patented data classification, extraction, and associative memory (search) techniques inspired by the neural networks of the human brain.
Under the hood, the application bristles with pattern-recognition techniques and mathematical models optimized for reading and interpreting not just text in any language, but hand-written characters and bar codes, all without predefined dictionaries, taxonomies, or indexing. Understanding what it does at a high level is rather straightforward, though it would be almost impossible to summarize and explainhow it works without first detailing many advanced math and physics models, Kaplan says—even if it wasn’t proprietary.
Boosting the “No Touch” Rate
There’s no mystery about what users and technology partners think of the product, however.
“Brainware’s unique approach tears off the straitjacket of template creation and maintenance that less-advanced technologies require,” says Susie Walker, senior solution manager at Perceptive Software. The maker of ECM and BPM software has integrated Brainware’s technology into its own AP processing product.
According to Kaplan, Brainware boasts an unmatched invoice-reading success rate from the moment of installation.
One prospective customer asked Brainware and another OCR data capture vendor to spend eight hours trying to extract the data from 3,028 invoices. Despite a wide range of formats from different vendors, in multiple languages, and occasionally invalid or missing data, 35% of the documents went straight through Brainware’s system with full data and line item recognition. The competition’s “no touch” rate with the same stack of invoices? Zero.
The company bought the software and reduced headcount by 75% on the first day. It is now seeing a “scan-to-post” success rate for first-time invoice formats at over 50%. And accounting managers estimate that the application’s continuous self-learning will eventually provide an additional 20 to 30-point boost to that number.
“The high data capture rates on invoices that Brainware enables vastly decreases the amount of manual data entry required to process both PO and non-PO invoices,” says Matt Cramer, product manager for Perceptive Software.
Line Items and The Bottom Line
The cost of intelligent data capture technology is higher than first and second-generation OCR and data capture solutions. But, as with any leading-edge technology, its cost must be weighed against its ability to reduce the cost of doing things the conventional way.
Besides freeing staff for other tasks, speeding up the pay cycle with intelligent data capture allows companies to gain early-payment discounts.
“Add up the elimination of templates for both initial setup and ongoing maintenance, the reduction of human interaction in the AP process, and integration with existing ECM, BPM, and ERP workflows,” says Cramer, “and it’s obvious that intelligent data capture technology is not just an ROI story but a clear winner for total cost of ownership.”