Converting PDF to PDF/A Using Python

By Isaac Maw | 2025 Mar 05

3 min

What is PDF/A?

Copied to clipboard

The PDF/A standard is designed to solve these issues. The ‘A’ in PDF/A stands for ‘archival,’ and the PDF/A has features designed to preserve documents, including formatting and fonts as well as raster and vector graphics, for long term storage.

PDF/A complies with the ISO 19005 standard for electronic document file format for long-term preservation. By embedding all necessary elements, PDF/A ensures that documents are consistently rendered and reliable over time.

PDF to PDF/A Conversion

Copied to clipboard

To convert your documents into the PDF/A archival format, our cross-platform PDF/A SDK converts 20+ file formats into ISO-compliant PDF/A files that pass VeraPDF validation.

Check out the demo

The PDF/A SDK converts from 20+ file formats, including PDF, JPG, HTML, Word, and TIFF into VeraPDF-valid ISO-compliant PDF/A files. The SDK can also repair non-compliant PDF/A files. It also supports all PDF/A versions and conformance levels: PDF/A-1A, PDF/A-1B, PDF/A-2A, PDF/A-2B, PDF/A-2U, PDF/A-3A, PDF/A-3B, PDF/A-3U, PDF/A-4, PDF/A-4E, PDF/A-4F.

The SDK supports high volume PDF to PDF/A batch conversion from the command-line, or as a development library integrated into a document workflow automation.

The process analyzes the content of existing PDF files and performs a sequence of modifications in order to produce a PDF/A compliant document. Features that are not suitable for long-term archiving (such as encryption, obsolete compression schemes, missing fonts, or device-dependent color) are replaced with their PDF/A compliant equivalents. Because the conversion process applies only necessary changes to the source file, the information loss is minimal. Also, because the converter provides a detailed report for each change, it is simple to inspect changes and to determine whether the conversion loss is acceptable.

How to Use the SDK in Python

Copied to clipboard

Check out our documentation guide for all the details on using the PDF/A SDK. We also provide sample code for using the PDF/A conversion SDK in Python.

#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------
import site
site.addsitedir("../../../PDFNetC/Lib")
import sys
from PDFNetPython import *
sys.path.append("../../LicenseKey/PYTHON")
from LicenseKey import *
#---------------------------------------------------------------------------------------
# The following sample illustrates how to parse and check if a PDF document meets the
#    PDFA standard, using the PDFACompliance class object. 
#---------------------------------------------------------------------------------------
def PrintResults(pdf_a, filename):
    err_cnt = pdf_a.GetErrorCount()
    if err_cnt == 0:
        print(filename + ": OK.")
    else:
        print(filename + " is NOT a valid PDFA.")
        i = 0
        while i < err_cnt:
            c = pdf_a.GetError(i)
            str1 = " - e_PDFA " + str(c) + ": " + PDFACompliance.GetPDFAErrorMessage(c) + "."
            if True:
                num_refs = pdf_a.GetRefObjCount(c)
                if num_refs > 0:
                    str1 = str1 + "\n   Objects: "
                    j = 0
                    while j < num_refs:
                        str1 = str1 + str(pdf_a.GetRefObj(c, j))
                        if j < num_refs-1:
                            str1 = str1 + ", "
                        j = j + 1
            print(str1)
            i = i + 1
        print('')	
def main():
    # Relative path to the folder containing the test files.
    input_path = "../../TestFiles/"
    output_path = "../../TestFiles/Output/"
    
    PDFNet.Initialize(LicenseKey)
    PDFNet.SetColorManagement()     # Enable color management (required for PDFA validation).
    
    #-----------------------------------------------------------
    # Example 1: PDF/A Validation
    #-----------------------------------------------------------
    filename = "newsletter.pdf"
    # The max_ref_objs parameter to the PDFACompliance constructor controls the maximum number 
    # of object numbers that are collected for particular error codes. The default value is 10 
    # in order to prevent spam. If you need all the object numbers, pass 0 for max_ref_objs.
    pdf_a = PDFACompliance(False, input_path+filename, None, PDFACompliance.e_Level2B, 0, 0, 10)
    PrintResults(pdf_a, filename)
    pdf_a.Destroy()
    
    #-----------------------------------------------------------
    # Example 2: PDF/A Conversion
    #-----------------------------------------------------------
    filename = "fish.pdf"
    pdf_a = PDFACompliance(True, input_path + filename, None, PDFACompliance.e_Level2B, 0, 0, 10)
    filename = "pdfa.pdf"
    pdf_a.SaveAs(output_path + filename, False)
    pdf_a.Destroy()
    
    # Re-validate the document after the conversion...
    pdf_a = PDFACompliance(False, output_path + filename, None, PDFACompliance.e_Level2B, 0, 0, 10)
    PrintResults(pdf_a, filename)
    pdf_a.Destroy()
	
    PDFNet.Terminate()
    print("PDFACompliance test completed.")
if __name__ == '__main__':
    main()

To get started, try the SDK now or contact our team.

Converting PDF to PDF/A Using Python

What is PDF/A?

PDF to PDF/A Conversion

How to Use the SDK in Python

Resources

Related Articles

View all blogs

How to Solve Six Common Problems when Getting Started with Apryse WebViewer

WebViewer Video: Building a Document from Multiple Files

Invoice Recognition and Processing Video