Available Now: Explore our latest release with enhanced accessibility and powerful IDP features

A Beginner’s Guide to Barcode Extraction

By Isaac Maw | 2025 Jan 23

Sanity Image
Read time

4 min

Summary: Looking to add barcode extraction capability to your application? Here’s an overview of the what, why and how of extracting barcodes from digital files such as PDF documents or scanned images, including key considerations for developers and IT Managers.

Barcodes are an important way to quickly carry information from the physical world to the digital world, rapidly logging information from shipping labels, medical bracelets, or UPC codes. But when barcodes and QR codes are found in a digital document such as an image or PDF, a solution is needed to see inside these human-readable documents to extract the valuable data encoded in the barcode.

For scanned images of labels, digitized patient records, and PDF invoices and receipts, a barcode SDK is essential. This guide provides an overview of what barcode extraction is and how to implement it in your application.

Barcodes in PDFs and Digital Files

Copied to clipboard

The first barcode was designed to encode information on train cars. Fast forward to today, and barcodes are primarily designed as a way to quickly input data to a computer, saving manual data entry and related errors. However, because today’s documents must move between digital and physical formats as they move through automated processes and workflows, it’s necessary to read barcodes from digital files such as images as well.

Use cases that may utilize barcode extraction include:

  • Retail: Managing scanned receipts, inventory labels, and invoices
  • Healthcare: extracting information from patient records or medical images
  • Logistics: managing shipping labels for traceability and automation
  • When extracting data from barcodes via a scanner or document extraction, accuracy and automation are critical.

What is Barcode Extraction?

Copied to clipboard

Barcode Extraction is software which detects barcodes in digital files such as images or PDF documents, and converts the encoded data into usable information.

Types of barcodes include 1D formats such as EAN-13, UPC, and Code 128, and 2D formats such as QR codes and Data Matrix. Usually, 2D formats are required for use cases where more data must be stored. For example, whereas a UPC code can store 12 digits between 0-9. The largest QR code format can store up to 3 KB of data, or over 4000 alphanumeric characters. In addition to data encoding, barcodes are selected based on conventions and compliance requirements. For example, the United States Military has several proprietary barcode formats which are required in specific instances.

How Barcode Extraction Works

Copied to clipboard

The Barcode Extraction Module for the Apryse Server SDK, Barcode Extraction:

  • Detects barcodes in the document.
  • Next, pre-processing is performed to despeckle and deskew the image for improved accuracy.
  • The data embedded in the barcode is extracted.
  • Data is output in JSON format for later use.

To try the Apryse Barcode Extraction module for yourself, you can view the installation and usage instructions in our documentation guide. 

Apryse Barcode Extraction provides broad barcode format support for over 100 barcode types, ensuring compatibility across industries including retail, healthcare, and logistics. In addition, the barcode module runs server-side only for enhanced scalability and performance for enterprises. And because barcode extraction is just one of the document processing capabilities Apryse offers, it’s easy to integrate barcode extraction with other document processing capabilities such as OCR, template extraction, or PDF viewing for a unified workflow.

Key Considerations

Copied to clipboard

Integration Flexibility

Because barcode extraction is a tool to be used as part of a larger process such as inventory management or cataloguing scanned images, the barcode extraction SDK must be flexible to integrate into existing workflows.

Our barcode extraction module is designed with developer flexibility in mind, allowing for custom read options and seamless integration into existing workflows. This empowers users to optimize the barcode scanning process based on specific document needs, improving operational efficiency.

Another key aspect of integration flexibility is the format of the extracted data. This data may be collected in a database or used in other automated workflows via API, for example. With our module, by default, barcodes will be decoded if possible. If the decoding is successful, the data will be stored in the "text" field of the output JSON. If the barcode data cannot be decoded, the data will be stored as a Base64-encoded string in the "data" field of the output JSON. If you would prefer to receive the data back in a consistent manner, or would prefer to decode the data yourself, you can force the module to return the encoded binary data (again, in the "data" field of the output JSON) by specifying e_binary as the output format.

Advanced Image Processing

Scanned documents or images containing barcodes may be taken in low light or at an angle, or the barcode itself may be damaged or distorted. Pre-processing features like de-skewing and de-speckling, accessible through separate image processing calls, ensure that even distorted or damaged barcodes are accurately captured. This minimizes errors and increases the reliability of data extraction from documents, images, or labels.

Real-Time Barcode Extraction

While extraction from records and other documents is valuable, real-time barcode extraction enables a whole suite of use cases. While our solution is optimized for server-side use, it supports high-speed barcode reading, making it suitable for real-time applications like inventory scanning and logistics tracking.

Wrapping Up

Copied to clipboard

If you’re looking for a barcode extraction solution which has the scalability, accuracy and flexibility to handle large-scale enterprise workflows, consider the Apryse barcode extraction module. With advanced image processing, support for over 100 formats, and seamless integration with a full suite of document processing SDKs, it could be the best solution for your needs.

To learn more, try it now or contact us. 

Sanity Image

Isaac Maw

Technical Content Creator

Share this post

email
linkedIn
twitter