We have already witnessed the power of APIs to harness the capabilities of machine learning for face detection. Now it’s time for yet another interesting application of machine learning. In this post, we are going to cover the OCR Text Extractor API which can recognize and extract characters and words present in a printed document.
The OCR Text Extractor API is integrated with Rakuten RapidAPI. You can check the API console and follow along this tutorial to explore the features and give it a try.
In case you don’t have an account on Rakuten Rapid API, sign up now and get your universal API key to access the OCR Text Extractor API and thousands of other APIs hosted on Rakuten RapidAPI.
Table of Contents
Use Cases of OCR (Optical Character Recognition)
OCR finds a lot of applications where data records have to be converted from physical form to digital storage. Almost all businesses across the world rely one some form of pen and paper based data entry for recording their processes. OCR technology spares them the manual effort in transforming the paper records into digital ledgers.
Here are some broader use cases of OCR that can be applied to any business.
Automation
Organizations are increasingly looking at automating all aspects of their business processes. The biggest hurdle in automating a process is to figure out the ways to perform an activity without human intervention. OCR is one of the ways to mimic the manual data entry (text extractor) operation performed by a human operator. OCR achieves it much faster than a typical human user and with enough constraints in place, the system can eliminate all data entry errors that the humans are prone to committing.
Data Archiving
Data archiving is closely related to automation as there is a recurring need to digitize a huge pile of paper invoices, receipts and other documents for future reference. This is also a tedious chore for human operators. It consumes time and money in equal proportions and hence there is a lot of room for OCR API based automation to expedite the archival process.
Security
OCR is also used for enforcement in case of certain specific applications such as license plate recognition. OCR processing of images from security cameras is used to recognize license plate numbers of vehicles. This is a great add-on feature to help security teams track down unknown vehicles entering premises.
API Overview
The OCR Text Extractor API can recognize characters from images of printed documents such as invoices and receipts. Although it is capable of recognizing handwritten text also, it is best suited for structured documents.
Take a look at the API console.
API Endpoints
The endpoints supported by this API are categorized into “Reference Information” & “OCR and Text Extraction”.
GET List Ocr Engine Options
The “GET List Ocr Engine Options” endpoint is for informational purposes. It returns a few different versions of the OCR engine supported by this API. These versions primarily differ in terms of speed and accuracy of the result.
GET List Language Options
The “GET List Language Options” endpoint returns a list of languages supported by this API. The OCR Text Extractor API claims to support over 20 different languages. The list of supported languages is represented by their three letter ISO 639-2 code.
POST Extract Text From Image URI
The “POST Extract Text From Image URI” endpoint performs OCR extraction on an image pointed to by an URI. Currently it supports the JPG, PNG and GIF image formats and file sizes up to 5 MB.
POST Extract Text From Image File
The “POST Extract Text From Image File” endpoint performs OCR extraction on an image submitted in base64 binary format. All other constraints remain the same as “POST Extract Text From Image URI” endpoint.
Pricing
The OCR Text Extraction API is available under four subscription tiers.
You can choose the BASIC option, that gives you 50 free API calls for a month.
Subscribe to the API now and we are all set to explore the OCR Text Extraction API in the next section.
Time for Some OCR Magic
To witness this OCR magic, we need an image of a printed bill.
This image is available on the Internet.
Imagine that you are the owner of this EYE OF THAI-GER restaurant. You have a traditional billing printer that generates hundreds of these bills every week and you need a way to feed them into your cloud based accounting system. Sounds like a tedious job!
With the OCR Text Extractor API you can extract the clustered characters representing the items and their prices. Let’s see how the API interprets this bill.
Select the “POST Extract Text From Image URI” endpoint in the API console and feed in the JSON input as shown below.
The JSON Body string contains all the input parameters for the API. Except for the “Uri” value, we keep everything default and replace the “Uri” value with the URI of the bill image from the Internet and trigger the API.
If you compare the value of “parseText” in API response, we will notice that the data is parsed in columns and stored in a single long sequence. In this way you can identify the items in the DESC column and their prices in the AMT column on the bill.
However, a closer look at the API response will reveal an anomaly.
The amount of “$13.00” got caught in between the sequence that contains the DESC column values. This has happened due to the close proximity with the text of the adjacent AMT column.
The API did a decent job in extracting the items and their price figures printed in the bill, except for that one glitch. As such, this problem can be solved by having a clear separation in the printed text. That’s why an effective use of OCR requires a well defined structuring of the printed document. Otherwise the parsing and interpretation of data goes haywire.
You got the basic idea of OCR based data extraction. Now you can run through all your bills and extract the data in a matter of a few minutes.
Power Up Your OCR Use Cases
As a next level challenge, you can write a script to parse the API response to perform some ETL (Extract, Transform and Load) operations and make the data available on a database. You are likely to face more obstacles as the API returns data in a linear sequence with no delimiters. Therefore, to build a real world application, you will have to design the structure of the print document in a way that makes parsing the API response easier.
In case you want to explore similar APIs, take a look at our APIs in the Visual Recognition category. You can also check out our Machine Learning API Collection to know more about the other APIs that support broader ML applications, including image analysis.
Leave a Reply