Documents, such as invoices, personal ID cards, or other standardized forms, contain important information that is essential for the company’s smooth operation and growth. Therefore, fast extraction is very convenient and, in most cases, provides a serious competitive advantage. Yet, it has been a big challenge for years as the questions of accuracy, structure, and of the entire process infrastructure itself were not sufficiently answered. We have implemented an innovative solution using the Microsoft Azure Form Recognizer, an automated machine learning solution for text recognition, and we want to share the exciting insights based on a proof-of-concept (POC) analysis.
Invoice processing case
An office equipment producer offers benefits to its clients based on the amount and types of products that they have purchased (office chairs and desks for example). The sales pipeline is indirect, the equipment is sold by various resellers, and the end clients must provide the invoices directly to the producer as proof of their purchases - in this case after the registration into the loyalty program. Thus, the producer receives hundreds of invoices that need to be processed to determine the benefits for each client. Processing all of the invoices manually is overwhelmingly time-consuming and prone to error. The invoices often need to be checked more than once, thus further increasing the already high costs of labor. In this case, the benefit calculation is done twice a year, and the producer spends on average about 14 work-days just processing the invoices, and sometimes hiring extra help that is only for this purpose. To explain the process further, the invoices are sent in paper form via post, or are scanned and sent via email.
Reforming established methods and habits is not easy, yet it can, in a way, revolutionize daily tasks that bring in positive outcomes and improve processes. In this case, handling the invoices for the benefits happened as follows:
Customers sent their invoices via email or mail to the producer before the set deadline.
Once the invoices were collected, an employee started processing them. Processing one invoice went as follow:
- Open an invoice.
- Copy or type the seller company name and ID, customer company name and ID, product names, quantity, and prices into an MS Excel file.
- Check if all of the information is correct.
- Mark the invoice as processed by saving it to the "Done" folder.
The process takes 4-7 minutes per invoice, depending on the number of products. During the process the employees make mistakes, which prolongs the entire process.
Invoices contain diverse information that needs to be extracted, mainly to assign the invoice to the correct client, extract information about the seller, and find all of the products that the company has produced (an illustrative invoice with labeled details is provided below). There can be a varying number of products; some invoices even include several page-long lists. Moreover, different sellers use various accounting systems to create the invoices, resulting in many differences between the invoice styles and the locations where a certain piece of information may be found. Therefore, this case required a more detailed analysis of the possible approaches.
We have created a custom solution for invoice processing that contains user and administration interfaces. The core of the solution is the Azure Form Recognizer. The application selection process involved broad research and testing that led us to the optimal solution (read more about the technical part of the solution below).
In practice, the traditional technique was changed markedly for the people involved in the process. Now, the customers can register and upload the invoices via a user-friendly web application. The processors see all of the new invoices in the administration portal. When a new invoice is uploaded, the employee sees the tag "New" and is prompted to check it. However, now the role is different when the main task is to review and correct the information. Once they open the new invoice, they see the information automatically written in the necessary fields. Some fields may show a warning to double-check if the data was inserted correctly. It is easy to check as the original invoice is already uploaded and can be opened instantly.
With this approach we were not only able to speed up the processing, but also make it easy, simple and clear for both the administrators and the customers.
The average invoice processing time was between 4-7 minutes, and with the document recognizer, we were able to decrease it by half, down to 2 minutes or less. Even when a human check is still needed, it can substantially reduce the amount of time and resources needed, decrease human errors, and standardize the process, all at the same time.
Another advantage is the change to continuous activity. Shifting from processing a large volume of documents twice-a-year to quick and easy processing, provides an opportunity to process continually, for example once a week or month. Now the customers can receive benefits all year-long.
Form Recognizer overview
After evaluating the available options, we selected the Azure Form Recognizer, which is part of Azure Cognitive Services. It is a service for information extraction from scanned documents. As it is relatively new on the market, Microsoft provides upgrades for the product often, increasing precision and broadening its applicability by introducing new features and approaches.
The Azure Form Recognizer offers several options to extract values from a document, based on a highly trained AI solution. These can be divided into pre-trained models, key-value pairs extraction, and custom models. The pre-trained models are highly effective, but only for the specific groups of documents they were trained on. While there is a prepared model for invoices, it is mostly trained on US data. Therefore, it is not applicable in our scenario due to significant differences in structure compared to the invoices from Central Europe. The key-value extraction is best suited for extracting data from simple, well-structured office forms such as application forms. As such, we ended up with the choice that offered us the highest flexibility, the custom models.
Custom models allow us to train the AI solution based on existing invoices to form multiple models for different invoice issuers. They require a rich set of training data, detailed preparation, and thoroughness of marking the values needed on the training invoices. You can see the illustration of a labeled invoice in the Form Recognizer Labelling Tool (this tool needs to be used for the preparation of the training invoices and for marking the information that is required for extraction).
The outcome depends on the quality and variety of the included invoices for training the models and their labeling quality.
Extracted values from the Azure Form Recognizer may require further processing. For example, the quantities are partially extracted with the units (e.g., “2 pcs”, “1 set”), and these units need to be deleted to be able to use the quantity as a number. Further processing is mentioned in the precision evaluation analysis.
"At Cross Masters, using cutting-edge AI technologies is not only a passion; it is an essential part of our work culture that requires continuous innovation. One of our latest success stories is the automation of the manual paperwork that is needed to process thousands of invoices. Thanks to Microsoft Form Recognizer’s AI engine, we were able to develop a unique customized solution for our client's invoice recognition tasks. What we find most convenient is the constant extraction quality improvement and the introduction of new features in the Form Recognizer - such as model composing or table labelling. This assures our clients competitive advantage in the market and helps elevate our product to the level of best-in-class solution." Jan Hornych, Head of Automation.
We tested the extraction of the values by the Azure Form Recognizer, in combination with subsequent automatic processing of the values, and compared it to human extraction in a proof-of-concept analysis. We found two ways to optimize the results: 1) Do a comparison of extracted values to lists of values or value combinations (for example, the list of sellers’ names and corresponding IDs), and 2) Cross referencing of the total price against the sum of all of the product prices.
1. Comparison of extracted values to lists of values
One of the critical pieces of information on the invoice is the identification of which client should receive the benefits based on a particular invoice. To maximize the precision of this identification, we extract more separate characteristics of the client; not only the name, but also the company registration ID for a legal entity, or personal ID for an individual client. We then match the extracted values with an existing client database, where the clients need to be registered to receive the benefits. If the extracted values are not matched to the same client, it shows a warning, and the invoice needs to be checked by a human. This automated process resulted in a higher precision for the client assignment than what was achieved by a human only, even though the human assignment process was checked afterward. In the graph below, you can see the comparison. Although both approaches have very high precision, the AI is more successful when combined with lists of value combinations.
2. Check of total price against the sum of all product prices
The extraction of the product information varies in precision. In general, the Azure Form Recognizer performs better on a shorter list of products with properly spaced text. The procedure can be again improved with the help of a list of possible product names. You can match the extracted products with their correct names and exclude any incorrectly extracted values.
To further verify that the products were extracted correctly, a check is introduced comparing the total price extracted from the invoice with the sum of the price and quantity multiplied for each product (hereafter CheckSum). The outcomes based on the POC are presented in the graph below. When the CheckSum is satisfied (in 44.7 % of the cases), it is almost certain that the products were extracted correctly (with a 96% probability, which is higher than the average precision of human extraction). When the CheckSum is not satisfied (in the remaining 55.3 % of the cases), the extracted products need to be checked by a human to verify if the products were extracted correctly.
The CheckSum can be not satisfied despite products being correctly extracted because we only need the correct extraction of the product names and quantities, the prices are available from the database.
In more than a half of the cases that need to be checked by a human (precisely in 32.5% of all cases), the products and their quantities were indeed extracted correctly, and the check failed due to the wrong extraction of the total price, individual product prices, or their discounts. In the rest of the cases (22.8% of the cases) the extracted product information is not completely correct (e.g., a product is missing or some quantity is incorrect), and the values need to be corrected or inserted by a human. Still, the people correcting the values do not need to re-type the whole invoice. They can have all the values pre-filled and just compare and correct what is necessary.
The precision of the extraction greatly depends on using a model trained on data from the same seller. While it may be impractical to provide a model for every seller, the best results can be achieved by training models for each of the largest sellers to cover the largest share of invoices with minimal cost. In the POC, 89% of the invoices were issued by a seller for whom there is a specifically trained model.
Estimation of time saving using AI
As mentioned in the precision evaluation, the values extracted from the Azure Form Recognizer need to be partially verified. As described above, we implemented the AI document processing solution in combination with a user-friendly application that is tailored for the purpose of verifying and correcting the extracted values when necessary. Here, we demonstrate an estimation for time saving for the combination of Azure Form Recognizer and our application.
There are different checks that can be introduced for most of the fields. The person checking the invoices then does not need to spend time on checking all of the values and can focus only on the fields with no extracted information or checks that are not satisfied, leading to time saved on processing the documents. For example, for checking the list of products and their quantities there is no further human interaction that is necessary for invoices with product extraction verified by the CheckSum (44.7%). More than half of the remaining invoices (32.5% of all cases) only require a check of the values with no further correction, which can take up to 20 seconds on average. The last part of invoices (22.8%) requires checking and correcting at least some of the product names or quantities which can take around 1.5 minutes. Without the Document Recognizer, it can take around 4-7 minutes to set up an invoice in the database and fill in all of the necessary values. The Document Recognizer prepares the invoice, pre-fills all of the known information and marks values that require verification or correction. Using the distribution of missing values and non-satisfied checks for all of the fields from our POC, the integration of the Document Recognizer can save half of time spent per invoice.
There are variable costs connected to using the Azure Form Recognizer service and fixed costs for the application interface (for the human check of the extracted values) and time spent for the training of models. The pricing of the Azure Form Recognizer depends on the selected approach and the number of invoices. The entire process of training the models can take up to 4 hours (0.5 work-days) per model, as it requires selecting representative invoices for training, labeling them using Microsoft Labelling Tool, running the training, and checking the extracted results for an invoice example to verify the results. The training takes longer for invoices with poorly spaced text or varying product information (e.g., partial inclusion of discount information) because it requires more training invoices and repeated checking of the results.
The operation of the whole solution is cheap and if we only calculate the operating costs, it will pay off even for an amount in the low hundreds of invoices per year. The biggest investment is in the initial integration and training of the models, and it depends on the complexity of a particular case. While it is too costly for a very small number of different documents, it can be a dramatic difference for a larger bulk of documents. Nevertheless, in our experience, the solution pays off from 2,000 invoices a year. In that case, the return on investment is 100% within two years.
To summarize, the Azure Form Recognizer is a valuable innovative tool that allows companies to automatically extract information from scanned or electronic documents. Its precision can be higher than human extraction when accompanied by additional verifications and lists of values. To maximize the amount of time saved by the AI implementation, the solution needs to be accompanied by additional automatic processing of the values and an application tailored for checking and correcting the extracted values in the case of unsatisfied checks.