Unlocking the Power of PaddleOCR

An Introduction to Text Detection and Recognition

Vinod Baste
8 min readSep 20, 2023
Photo by David Travis on Unsplash

Optical Character Recognition (OCR) is a powerful technology that enables machines to recognize and extract text from images or scanned documents. OCR finds applications in various fields, including document digitization, text extraction from images, and text-based data analysis. In this article, we will explore how to use PaddleOCR, an advanced OCR toolkit based on deep learning, for text detection and recognition tasks. We will walk through a code snippet that demonstrates the process step-by-step.

Table of content:

  1. Prerequisites
  2. Setting up PaddleOCR
  3. Step-by-Step Implementation
  4. Text Detection
  5. Text Recognition

Prerequisites

Before we dive into the code, let’s ensure we have everything set up to run the PaddleOCR library. Make sure you have the following prerequisites installed on your machine:

  1. Python (3.6 or higher)
  2. PaddleOCR library
  3. Other necessary dependencies (e.g., NumPy, pandas, etc)

You can install PaddleOCR using the following pip command:

pip install paddleocr

Setting up PaddleOCR

Once you have Python and the required libraries installed, let’s set up PaddleOCR. You can use PaddleOCR’s pre-trained models, which are available for text detection and recognition.

Code Overview

The code snippet for text detection and recognition using PaddleOCR consists of the following main components:

  1. Image Preprocessing: Load the input image and perform any necessary preprocessing steps, such as resizing or normalization.
  2. Text Detection: Utilize the PaddleOCR text detection model to locate bounding boxes around the text regions in the input image.
  3. Text Recognition: For each detected bounding box, use the PaddleOCR text recognition model to extract the corresponding text.
  4. Post-processing: Organize the detected text and recognition results for further analysis or display.

Step-by-Step Implementation

Let’s break down the code snippet and explain each step in detail:

  1. Text Detection

The code is a part of a class named DecMain, which is designed for Optical Character Recognition (OCR) evaluation using ground truth data. It uses PaddleOCR to extract text from images and then calculates metrics like precision, recall, and Character Error Rate (CER) to evaluate the performance of the OCR system.

class DecMain:
def __init__(self, image_folder_path, label_file_path, output_file):
self.image_folder_path = image_folder_path
self.label_file_path = label_file_path
self.output_file = output_file

def run_dec(self):
# Check and update the ground truth file
CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file()

df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df()

ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file()

# Get the extracted text as a list of dictionaries (representing the OCR results)
ocr_results = df.to_dict(orient="records")

# Calculate precision, recall, and CER
precision, recall, total_samples = CalculateMetrics(ground_truth_data, ocr_results).calculate_precision_recall()

CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples,
file_name=self.output_file).create_sheet()

Let's break down the code and explain each part:

class DecMain:
def __init__(self, image_folder_path, label_file_path, output_file):
self.image_folder_path = image_folder_path
self.label_file_path = label_file_path
self.output_file = output_file
  • The DecMain class has an __init__ method that initializes the object with the following parameters:
  • image_folder_path: The path to the folder containing the input images for OCR.
  • label_file_path: The path to the ground truth label file that contains the actual text content of the images.
  • output_file: The filename of the output file where the evaluation results will be saved.
def run_dec(self):
# Check and update the ground truth file
CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file()
  • The run_dec method is responsible for running the OCR evaluation process. It first checks and updates the ground truth file using the CheckAndUpdateGroundTruth class.
df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df()
  • The OcrToDf class is used to convert the OCR results into a pandas DataFrame (df). It takes the following parameters:
  • image_folder: The path to the folder containing the input images for OCR.
  • label_file: The path to the ground truth label file.
  • The parameters det=True and rec=True indicate that both text detection and recognition results will be included in the DataFrame.
ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file()
  • The ReadGroundTruthFile class is used to read the ground truth label file and load its contents into the ground_truth_data variable.
# Get the extracted text as a list of dictionaries (representing the OCR results)
ocr_results = df.to_dict(orient="records")
  • The OCR results obtained in DataFrame df are converted to a list of dictionaries (ocr_results), with each dictionary representing the OCR result for a single image.
# Calculate precision, recall, and CER
precision, recall, total_samples = CalculateMetrics(ground_truth_data, ocr_results).calculate_precision_recall()
  • The CalculateMetrics class is used to calculate the OCR evaluation metrics: precision, recall, and the total number of samples evaluated. The class takes the ground truth data and OCR results as inputs.
CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples,
file_name=self.output_file).create_sheet()
  • The CreateSheet class is responsible for creating an output sheet (e.g., Excel or CSV) with the evaluation metrics and OCR results. It takes the DataFrame df, precision, recall, total samples, and the output filename as inputs.

Overall, the DecMain class provides a structured way to evaluate the OCR performance using ground truth data and PaddleOCR's text detection and recognition capabilities. It calculates important evaluation metrics and stores the results in a specified output file for further analysis.

Note: Format of the Ground Truth Label File

To perform OCR evaluation using the DecMain class and the provided code, it's crucial to format the ground truth label file correctly. The label file should be in JSON format and follow the structure as shown below:

image_name.jpg [{"transcription": "215mm 18", "points": [[199, 6], [357, 6], [357, 33], [199, 33]], "difficult": False, "key_cls": "digits"}, {"transcription": "XZE SA", "points": [[15, 6], [140, 6], [140, 36], [15, 36]], "difficult": False, "key_cls": "text"}]

The label file should be in JSON format. Each line of the file represents an image’s OCR ground truth.

Each line contains the filename of the image, followed by the OCR results for that image in the form of a JSON object.

The JSON object should have the following keys:

"transcription": The ground truth text transcription of the image.

"points": A list of four points representing the bounding box coordinates of the text region in the image.

"difficult": A boolean value indicating whether the text region is difficult to recognize.

"key_cls": The class label of the OCR result, e.g., "digits" or "text".

Make sure to follow this format while creating the ground truth label file for accurate OCR evaluation.

If you’re eager to explore the full implementation of the OCR evaluation using PaddleOCR, you’re in luck! I’ve made the entire code available on my public Git repository. You can access it at here. The repository contains the DecMain class along with other necessary classes that enable you to perform OCR, calculate evaluation metrics, and generate output sheets. Feel free to clone the repository, try out the code with your own data, and even contribute to its improvement!

2. Text Recognition

The code defines a class named RecMain, which is designed to run text recognition (OCR) using a pre-trained OCR model on a folder of images and generate an evaluation Excel sheet.

class RecMain:
def __init__(self, image_folder, rec_file, output_file):
self.image_folder = image_folder
self.rec_file = rec_file
self.output_file = output_file

def run_rec(self):
image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file). \
get_image_paths_from_folder()

ocr_model = LoadRecModel().load_model()

results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images()

ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict()

model_predictions, ground_truth_texts, image_names, precision, recall, \
overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results,
ground_truth_data).evaluate_model()

# Create Excel sheet
CreateMetricExcel(image_names, model_predictions, ground_truth_texts,
precision, recall, cer_data_list, overall_model_precision, overall_model_recall,
self.output_file).create_excel_sheet()

Let's break down the code and explain each part:

class RecMain:
def __init__(self, image_folder, rec_file, output_file):
self.image_folder = image_folder
self.rec_file = rec_file
self.output_file = output_file
  • The RecMain class has an __init__ method that initializes the object with the following parameters:
  • image_folder: The path to the folder containing the input images for text recognition.
  • rec_file: The path to the ground truth label file that contains the actual text content of the images.
  • output_file: The filename of the output Excel sheet where the evaluation results will be saved
def run_rec(self):
image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file).get_image_paths_from_folder()
  • The run_rec method is responsible for running the text recognition process. It first uses the GetImagePathsFromFolder class to get a list of image paths within the specified image_folder. This step ensures that the OCR model will process all images within the given directory.
ocr_model = LoadRecModel().load_model()
  • The LoadRecModel class is used to load the pre-trained OCR model for text recognition. It may utilize PaddleOCR or any other OCR library to load the model.
results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images()
  • The ProcessImages class is responsible for processing the images using the loaded OCR model. It takes the OCR model (ocr_model) and the list of image paths (image_paths) as inputs.
ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict()
  • The ConvertTextToDict class is used to read the ground truth label file and convert it into a dictionary format (ground_truth_data). This conversion prepares the ground truth data for comparison with the OCR model predictions.
model_predictions, ground_truth_texts, image_names, precision, recall, \
overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results,
ground_truth_data).evaluate_model()
  • The EvaluateRecModel class is responsible for comparing the OCR model predictions with the ground truth data and calculating evaluation metrics such as precision, recall, and Character Error Rate (CER). It takes the OCR model predictions (results) and the ground truth data (ground_truth_data) as inputs.
# Create Excel sheet
CreateMetricExcel(image_names, model_predictions, ground_truth_texts,
precision, recall, cer_data_list, overall_model_precision, overall_model_recall,
self.output_file).create_excel_sheet()
  • The CreateMetricExcel class is responsible for creating an output Excel sheet with the evaluation metrics and OCR results. It takes various input data, including image names, model predictions, ground truth texts, evaluation metrics, and the output filename (self.output_file).

Overall, the RecMain class orchestrates the entire text recognition process, from loading the OCR model to generating the evaluation Excel sheet with detailed metrics. It provides an organized and reusable way to evaluate the performance of an OCR model on a given set of images with ground truth data.

Note: Format of the Ground Truth Text File

To perform OCR evaluation using the RecMain class and the provided code, it's essential to format the ground truth (GT) text file correctly. The GT text file should be in the following format:

image_name.jpg text

Each line of the file represents an image’s GT text.

Each line contains the filename of the image, followed by a tab character (\t), and then the GT text for that image.

Ensure that the GT text file contains GT text entries for all the images present in the image folder specified in the RecMain class. The GT text should match the actual text content present in the images. This format is necessary for accurate evaluation of the OCR model's performance.

If you’re eager to explore the full implementation of the OCR evaluation using PaddleOCR, you’re in luck! I’ve made the entire code available on my public Git repository. You can access it at here. The repository contains the RecMain class along with other necessary classes that enable you to perform OCR, calculate evaluation metrics, and generate output sheets. Feel free to clone the repository, try out the code with your own data, and even contribute to its improvement!

Conclusion

we explored the process of text detection and recognition using PaddleOCR, an advanced OCR toolkit based on deep learning. We walked through a code snippet that demonstrates the step-by-step implementation of text detection and recognition. With PaddleOCR’s powerful pre-trained models and easy-to-use API, performing OCR on images has never been easier.

Now it’s your turn to try out the code snippet and experiment with different images or text recognition scenarios.

You can find the source code here.

Thank you for taking the time to read this article. If you found this post to be useful and interesting, please clap and recommend it.

If I got something wrong, mention it in the comments. I would love to improve

Connect with me on GitHub and LinkedIn.

--

--