Apryse Announces Acquisition of AI-Powered Document Toolkit Provider LEAD Technologies

What Are the Different Versions of PDF/A?

By Apryse | 2024 Feb 28

Sanity Image
Read time

5 min

Organizations prefer PDF/A for its industry acceptance and advantages over other archiving formats in terms of its ability to preserve text, vector graphics, raster images and related metadata. Nevertheless, with various PDF/A standards and conformance levels (and presently, eleven possible combinations) it’s easy to get a little lost.

If you’re interested in brushing up on your PDF/A taxonomy, read on. In this article, we will cover the different PDF/A standards and conformance levels, as well as their significance, as well as seeing how you Apryse can help you to use them.

What Is PDF/A?

Copied to clipboard

First, what exactly is PDF/A? It is a special variant of PDF designed specifically for long-term document preservation (the “A” stands for archive). Originally released in 2005, its goal was to create a format that reliably rendered in exactly the same way regardless of system or software. Normal PDF documents do not meet this criteria; instead, they often contain elements such as fonts or colors that change based on the viewer, host operating system, or state of the PDF itself. PDF/A solves this problem by embedding all the information necessary for displaying the document in the PDF itself.

Why Is PDF/A Important?

Copied to clipboard

There are many reasons to use PDF/A for archiving purposes, but the two main reasons are its advantages over other electronic formats and its industry acceptance.

The most widely used alternative for digital archiving is TIFF, a raster image format that promises the same guaranteed visual appearance of the document that PDF/A does. However, TIFF does not include vector content like shapes, gradients, or vector fonts. Not only does vector content more accurately describe the original document, it often takes up less disk space, which is a consideration when archiving. PDF/A has other advantages over TIFF, like including unicode, which makes text extractable and searchable. It also supports digital signatures, allowing users to verify that the PDF was not altered.

PDF/A also maintains a high level of industry acceptance. When the format was published in 2005, a group of European companies formed the PDF/A Competence Center to raise the format’s profile and promote its benefits to industry and government. Since then, many institutions, especially in Europe, have mandated PDF/A as the required file format for archiving. Various U.S. agencies, such as NARA and PACER, also accept PDF/A as a format. And, of course, since a PDF/A file is a PDF file, free viewers are widely available on virtually all computing devices.

What Are the Different PDF/A Versions and Conformance Levels?

Copied to clipboard

PDF/A comes in many different possible variants, created by mixing different PDF/A standards and conformance levels. Each PDF/A standard defines the array of available features and image compression technologies that help with the preservation of the content of a file. In turn, each PDF/A standard supports different conformance levels (a & b for PDF/A-1; and a, b & u for PDF/A-2, -3, and e & f for PDF/A-4). These conformance levels control the “accessibility” requirements of a file that impact the ability of machines and people to understand the content.

In Detail: The Different PDF/A Standards

Copied to clipboard

PDF/A-1: (ISO 19005-1:2005)
PDF/A-1 is the original PDF/A standard, the most commonly used today, and the most restrictive. Because it is based on an older PDF standard, PDF 1.4—published by Adobe Systems in 2001—PDF/A-1 does not support JPEG 2000, layers or attachments. In addition, while supported in PDF 1.4, transparency was considered just “too new” at the time of PDF/A-1’s inception and therefore not included.

Missing features: JPEG2000, transparency, layers and attachments
Conformance levels: a & b
Based on PDF 1.4

PDF/A-2: (ISO 19005-2:2011)
Based on PDF 1.7 (ISO 32000-1:2008) PDF/A-2 introduces several features unavailable in PDF 1.4, as well as transparency. Additions include layers, improved image compression (JPEG 2000 and JBIG2) and attachments—provided that those attachments are in PDF/A format.

PDF/A-2 does not make PDF/A-1 files obsolete. Rather, the standard is intended to be forwards compatible: for example, a valid PDF/A-1b file should pass verification on software set to validate for PDF/A-2b or PDF/A-3b.

Lastly, conformance level u (as in Unicode) was also introduced with PDF/A-2. Level u allows organizations to guarantee that document text can be reliably searched and copied—without the file having to conform to other a-level requirements.

New & permitted features: JPEG 2000, transparency, layers and attachments (only other PDF/A files)
Conformance levels: a, b & u
Based on PDF 1.7 (ISO 32000-1:2008)

PDF/A-3 (ISO 19005-3:2012)
PDF/A-3 is virtually identical to PDF/A-2. (They even left the typos intact.) The one and only difference is that PDF/A-3 permits any file type as an attachment.

However, a PDF/A viewer is not required to do anything extra with these attached files beyond ensuring their proper extraction. Therefore, the standard cannot guarantee whether you will be able to read or otherwise use these files in the future, prompting archivists to voice concerns that PDF/A-3 might allow for circumvention of archival restrictions on permitted formats.

A response to the above concern has been to note that a carefully designed workflow, built with archival considerations in mind, could account for and leverage PDF/A-3’s capabilities. Indeed, PDF/A-3 was largely inspired by a desire to have a machine-readable component available, such as proprietary binary data or XML, used in situations where embedded formats could be carefully prescribed. An example of this is the ZUGFeRD hybrid e-invoicing standard, published two years after PDF/A-3’s introduction, endorsed by the German government, and favored by many European Union organizations & enterprises.

New & permitted features: Attachments (any filetype)
Conformance levels: a, b & u
Based on PDF 1.7 (ISO 32000-1:2008)

PDF/A-4 (ISO 19005-4:2019)
Sometimes referred to as PDF/A-NEXT, PDF/A-4 is the next iteration of the PDF/A standard, published in November 2020 as ISO 19005-4:2020. A-4 updates PDF/A to align with PDF 2.0, the latest version of the PDF ISO standard.

Significantly, the separate conformance levels a, b, and u are not used in PDF/A-4. Instead, PDF/A-4 encourages but does not require addition of higher-level logical structures, and it requires Unicode mappings for all fonts.

Additionally, PDF/A-4 introduces two new conformance levels, e & f.

PDF/A-4f allows file types of any other format to be embedded, whereas PDF/A-4e introduces support for RichMedia and 3D type annotations as well as embedded files to create a PDF/A version compatible with modern geospatial, construction, and engineering workflows. (The 'e' stands for engineering, as it does in the previously created PDF/E standard.)

New features: PDF 2.0 Compatability
Conformance levels: e & f Based on PDF 2.0 (ISO 32000-2:2017)

Different PDF/A Conformance Levels

Copied to clipboard

Level b (Basic)
PDF/A-1b, PDF/A-2b, PDF/A-3b

B-level conformance requires only that documents conform with guidelines for reliable viewing and therefore, is the easiest level to achieve.

From the ISO specification:

Level B conformance
Conformance level encompassing the requirements of this part of ISO 19005 regarding the visual appearance of electronic documents, but neither their structural or semantic properties nor the requirement that all text have Unicode equivalents.

Level a (Accessible)
PDF/A-1a, PDF/A-2a, PDF/A-3a

“Accessible” conformance is a superset of b-level conformance. It adds requirements for information intended to preserve a document’s logical structure, semantic content, and natural reading order.

In other words, a-level conformance not only ensures documents will look the same in the future; it also helps machines and people better understand and re-purpose its content. A valid a-level PDF/A will have text that can be reliably searched and copied, and content that is more accessible to technologies like screen readers for the blind.

A list of a-level requirements is as follows:

  1. Content must be tagged with a hierarchical structure tree, meaning elements such as reading order, figures and tables are explicitly identified through metadata.
  2. The natural language of the document must be identified.
  3. Images and symbols must have alternative descriptive text.
  4. The file must include character mappings to Unicode for reliable search and copy.

Note: none of these requirements will change the visual appearance of a document.

Level u (Unicode)
PDF/A-2u, PDF/A-3u

Like ‘level a’, u-level conformance requires character mapping to Unicode. However, it drops a-level requirements including embedded logical structure (i.e., tags and a structure tree) as specified in section 6.7 of ISO 19005-2 (PDF 1.7). Therefore, a PDF/A meeting u-level conformance will have text that can be reliably searched and copied, but the reading order will not be guaranteed.

More About PDF/A & Apryse’s PDF/A Solutions

Copied to clipboard

In summary, knowing your PDF/A options help you improve the value of your documents for specific viewing, sharing, printing or archiving purposes. If you would like more PDF/A information, check out our complete PDF/A guide.

If you’re interested in converting to a particular PDF/A variant, try Apryse’s free online PDF/A converter tool, able to convert 20+ file formats to any version of PDF/A; or read our article on how to convert to PDF/A with Apryse’s PDF SDK; or try PDF/A Manager our command-line tool.

If you have any questions about Apryse’s PDF SDK, feel free to get in touch!

This article was originally published Jan 2019 and has ben updated to include the latest information.

Sanity Image

Apryse

Share this post

email
linkedIn
twitter