Available Now: Explore our latest release with enhanced accessibility and powerful IDP features
By Isaac Maw | 2025 Mar 05
3 min
Summary: The PDF/A standard is an ISO 19005 compliant standard for digital document archiving. With the PDF/A conversion SDK, batch process documents, and convert 20+ file types to PDF/A in Python.
Maintaining a secure, reliable archive of documents is part of business operations in a wide variety of industries, from legal and insurance, to healthcare, engineering, and finance, and software developers are tasked with finding the solutions that make digital archiving work. With PDF/A conversion using Python, this archival document file format standard can become easy to use with your existing systems.
Historically, paper documents have been stored in filing rooms and storage facilities, with complex categorization and filing, indexing systems, and retention systems. Of course, digital documents have long since eclipsed paper for many business documents, and digital document archiving saves space, cost, and other organization and maintenance efforts, while making retrieval easier.
As our platforms, standards, and operating systems continually evolve and change, digital files that aren’t designed for archiving can be unreliable over time. Other file formats, fonts and storage media can become obsolete, leaving important documents corrupted or unusable.
For example, using standard PDF files or .DOCX files for archiving may result in challenges such as:
The PDF/A standard is designed to solve these issues. The ‘A’ in PDF/A stands for ‘archival,’ and the PDF/A has features designed to preserve documents, including formatting and fonts as well as raster and vector graphics, for long term storage.
PDF/A complies with the ISO 19005 standard for electronic document file format for long-term preservation. By embedding all necessary elements, PDF/A ensures that documents are consistently rendered and reliable over time.
To convert your documents into the PDF/A archival format, our cross-platform PDF/A SDK converts 20+ file formats into ISO-compliant PDF/A files that pass VeraPDF validation.
The PDF/A SDK converts from 20+ file formats, including PDF, JPG, HTML, Word, and TIFF into VeraPDF-valid ISO-compliant PDF/A files. The SDK can also repair non-compliant PDF/A files. It also supports all PDF/A versions and conformance levels: PDF/A-1A, PDF/A-1B, PDF/A-2A, PDF/A-2B, PDF/A-2U, PDF/A-3A, PDF/A-3B, PDF/A-3U, PDF/A-4, PDF/A-4E, PDF/A-4F.
The SDK supports high volume PDF to PDF/A batch conversion from the command-line, or as a development library integrated into a document workflow automation.
The process analyzes the content of existing PDF files and performs a sequence of modifications in order to produce a PDF/A compliant document. Features that are not suitable for long-term archiving (such as encryption, obsolete compression schemes, missing fonts, or device-dependent color) are replaced with their PDF/A compliant equivalents. Because the conversion process applies only necessary changes to the source file, the information loss is minimal. Also, because the converter provides a detailed report for each change, it is simple to inspect changes and to determine whether the conversion loss is acceptable.
Check out our documentation guide for all the details on using the PDF/A SDK. We also provide sample code for using the PDF/A conversion SDK in Python.
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
import site
import sys
from PDFNetPython import *
from LicenseKey import *
# The following sample illustrates how to parse and check if a PDF document meets the
# PDFA standard, using the PDFACompliance class object.
def PrintResults(pdf_a, filename):
err_cnt = pdf_a.GetErrorCount()
if err_cnt == 0:
print(filename + ": OK.")
print(filename + " is NOT a valid PDFA.")
i = 0
while i < err_cnt:
c = pdf_a.GetError(i)
str1 = " - e_PDFA " + str(c) + ": " + PDFACompliance.GetPDFAErrorMessage(c) + "."
if True:
num_refs = pdf_a.GetRefObjCount(c)
if num_refs > 0:
str1 = str1 + "\n Objects: "
j = 0
while j < num_refs:
str1 = str1 + str(pdf_a.GetRefObj(c, j))
if j < num_refs-1:
str1 = str1 + ", "
j = j + 1
i = i + 1
def main():
# Relative path to the folder containing the test files.
input_path = "../../TestFiles/"
output_path = "../../TestFiles/Output/"
PDFNet.SetColorManagement() # Enable color management (required for PDFA validation).
# Example 1: PDF/A Validation
filename = "newsletter.pdf"
# The max_ref_objs parameter to the PDFACompliance constructor controls the maximum number
# of object numbers that are collected for particular error codes. The default value is 10
# in order to prevent spam. If you need all the object numbers, pass 0 for max_ref_objs.
pdf_a = PDFACompliance(False, input_path+filename, None, PDFACompliance.e_Level2B, 0, 0, 10)
PrintResults(pdf_a, filename)
# Example 2: PDF/A Conversion
filename = "fish.pdf"
pdf_a = PDFACompliance(True, input_path + filename, None, PDFACompliance.e_Level2B, 0, 0, 10)
filename = "pdfa.pdf"
pdf_a.SaveAs(output_path + filename, False)
# Re-validate the document after the conversion...
pdf_a = PDFACompliance(False, output_path + filename, None, PDFACompliance.e_Level2B, 0, 0, 10)
PrintResults(pdf_a, filename)
print("PDFACompliance test completed.")
if __name__ == '__main__':
To get started, try the SDK now or contact our team.
Isaac Maw
Technical Content Creator
Share this post
Small Business
Popular Content