Verifying Text from an Image Using Optical Character Recognition Tool

Modified on Tue, 12 Nov, 2024 at 3:19 PM

TABLE OF CONTENTS

1. Optical Character Recognition Overview

Available for Python Selenium, Java WinAppDriver and Java Selenium frameworks.

1. Optical Character Recognition Overview

Optical Character Recognition (OCR) is the process that converts an image of text into a machine-readable text format. For example, if you scan a receipt, your computer saves the scanned receipt as an image file. You cannot use a text editor to edit, search, or count the words in the image file. However, you can use OCR to convert the image into a text document with its contents stored as text data.

In this topic, Python-tesseract tool is used to convert an image from a report generated in a web automation.
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and read the text embedded in images.

To know more about Java WinAppDriver and Java Selenium, Tess4J installation is required. Install Tess4J by including it as a dependency in your Java project. Tess4J can be obtained from the official Tess4J GitHub repository here. Follow the instructions provided in the README to add Tess4J to your project.

2. Prerequisites

You must install Python 3.11.x version. To install this version, click here.
You must have installed Selenium. If not, run the following command in the command prompt.
```
pip install selenium
```
To install tesseract tool, you must download tesseract-ocr-w64-setup-5.3.3.20231005.exe and follow the on-screen prompts to complete the installation process.

3. Testing Data

Perform the following

Run the following command in the command prompt to install pytesseract and opency-python

pip install pytesseract
pip install opencv-python

pip install pytesseract: This command uses the pip package manager to install the pytesseract package, which is a Python wrapper for Google's Tesseract-OCR Engine. This package allows Python to interact with Tesseract-OCR, enabling text recognition from images.
pip install opencv-python: This command uses pip to install the opencv-python package, which is a Python library for computer vision and image processing. OpenCV (Open Source Computer Vision Library) provides tools and functions to manipulate images and perform various computer vision tasks.

To verify the installation, run the following command to list all the installed packages.
```
Pip list 
```
The output file contains the verified text from an image and displays the image content.