Image Recognition

Modified on Thu, 20 Mar at 12:57 PM

Available as a Custom Action.

TABLE OF CONTENTS

1. Overview

2. Using Image Recognition Capabilities

2.1. Detects coordinates of text in an Image

2.2 Detects coordinates of target image

2.3. Detect coordinates of all text elements

2.4. Provide coordinates based on bounds

1. Overview

Image recognition is a machine learning (ML) capability that enables software command to identify UI elements. This feature will help automating when conventional approaches for identifying UI elements is not feasible. The solution can be integrated with automation scripts using custom actions. The image recognition capabilities are explained in the preceding section.

Note: You can reach out to Customer Success Team for downloadable exe file.

2. Using Image Recognition Capabilities

Following capabilities can be performed using the downloaded exe:

2.1. Detects coordinates of text in an Image

The algoQADetectElements.exe command runs in the background along with the algoQA generated scripts, without requiring user interaction. It processes the image, identifies the coordinates of the target text. Upon completing the task, the results are saved in a JSON file.

For example, if the source image is located at D:\Ping_identity\img\connect.png and the target text is "Save," the exe command algoQADetectElements.exe text "D:\Ping_identity\img\connect.png" "Save" will scan the image from the path and detect the position of the "Save" text. The coordinates of this text's position within the image are saved in the JSON file.

For Base command use algoQADetectElements.exe text "path of source image" "target text" 
For additional parameters refer to the following table:

Parameter	Type	Description
image_scale_factor	float (a number with decimal points). Default value: 1.0.	Adjusts the scale of the image to compensate for screen size or resolution differences. A value of 1.0 means no scaling is applied, while values less than 1.0 shrinks the image and values greater than 1.0 enlarges the image.
image_zoom_level	float. Default value: 1.0.	Controls the zoom level of the image. Similar to image_scale_factor, it adjusts how zoom in or out the image is. A value of 1.0 represents the normal zoom level.
x_scale_factor	Type: float. Default value: 1.0.	Adjusts the scaling of the image along the x-axis (horizontal). If you want to stretch or shrink the image horizontally, you would modify this value.
y_scale_factor	Type: float. Default value: 1.0.	Adjusts the scaling of the image along the y-axis (vertical). If you need to stretch or shrink the image vertically, you would modify this value.
x_correction	integer (whole number). Default value: 0.	Provides pixel-level adjustment for the x-axis. This can be used to correct for shifts in the image caused by UI elements or other factors (like misalignment or presence of the status bar on a screen).
y_correction	integer. Default value: 0.	Similar to x_correction, this adjusts the image's position along the y-axis to correct for elements like status bars or other visual components.

2.2 Detects coordinates of target image

Detects the coordinates of a target image (a smaller image or element) within a source image using the command algoQADetectElements.exe image "path of source image" "path of target image", and the output will be written to a JSON file that contains the coordinates of where the target image is present.

Let us consider the RedBus app to identify the available seats.

By using the detect coordinates of target image exe command with the RedBus app's dynamic seat booking screen, you can identify the position of elements like available seats. This is particularly useful for testing, validation, or automation tasks where you need to ensure that the UI elements are correctly rendered and interactive on the screen.

In this example: you can:

Capture a full screen (source image path - D:\RedBusApp\images\redbus_seat_map.png) of the RedBus seat map.
Capture a small image (target image path- D:\RedBusApp\images\redbus_seat_map.png ) of an available seat
The command will detect the coordinates (x, y) where the target image appears in the full seat map image, and output displays the coordinates in a JSON file.
```
For Base command use algoQADetectElements.exe image "D:\Ping_identity\img\connect.png"
For  add"D:\Ping_identity\img\t.png". 
```

2.3. Detect coordinates of all text elements

Detects all the text elements in an image and get the coordinates of each text element's position. The detected text and its corresponding coordinates will be saved in an Excel spreadsheet (ocr.xlsx).
Example:

Let us consider an example where the source image is located at "D:\Ping_identity\img\connect.png",you have an image that contains some text, and you want to detect the position (coordinates) of that text in the image.
The command processes the image, detects all text elements within it, and outputs the coordinates and the corresponding text in a spreadsheet (ocr.xlsx). The file will contain data about the detected text's position (coordinates) and the text itself..

The following table provides set of additional parameters describing the usage of a command called:

For Base command use algoQADetectElements.exe ocr "path of source image"
 
For additional parameters --x_scale_factor 1.2, y_scale factor, image_scale_factor and image_zoom_level  refer to the table in the section 2.1.

2.4. Provide coordinates using the UI element's bounds

Calculates the coordinates based on the provided parameters, and generates the desired location coordinates. Upon completing the task, the results are saved in a JSON file.

Example:

Let us consider an example image where the top-left corner coordinates are (44, 347) and the bottom-right corner coordinates are (1036, 619). The desired location for obtaining the coordinates is top-right (tr).

The command processes the image and calculates the coordinates of the top-right corner, saving the results in a JSON file (named detect_image_bounds.json).

Note: Bounds describe the edges or corners of an area you are working with, such as the limits of a region in an image or the coordinates that define where something starts and ends.
For example, if you are working with an image and you want to detect text or elements within a specific area, you might specify the bounds using coordinates (for example, the top-left corner and bottom-right corner). This defines the region of the image that the command should focus on for processing.

For Base command use algoQADetectElements.exe bounds --location tr 
For additional parameters: --top_left 44 347 --bottom_right 1036 619 –x_correction 30 y_correction 5. Refer to the details listed in the additional parameters table from section 2.1