Available Now: Explore our latest release with enhanced accessibility and powerful IDP features

Understanding Font Substitution in DOCX to PDF Conversion

By Matt Binsfeld, Roger Dunham | 2024 Apr 12

Sanity Image
Read time

7 min

The Apryse SDK enables you to create a PDF from an Office document without the need to install Office. Not only does this remove the need for an Office software license, but it also means that conversion can occur on platforms where Office is not supported – for example, on Linux.

When converting a DOCX to PDF, you generally expect that the resulting file should look the same as the Office document, with line and page breaks all being in the correct location. However, one critical aspect that often gets overlooked is the handling of fonts.

Font handling can significantly impact how the document appears in its final form, since fonts affect not just the look of the words, but also their size and potentially the number of words that fit onto a line. Changing the font can cause paragraphs to reflow, and text to even be pushed onto the following page.

In this article we will delve into the world of font substitution, embedding, and how they affect PDF output.

The Necessity of Font Presence

Copied to clipboard

For a font to be accurately represented in a PDF, it must be available at the time of PDF generation. If the required font is missing, an available substitute font will be used.

As an example, let’s look at a Word document that was created on a machine that has a specific font called Jeepers installed (it also includes text in two other fonts).

Blog image

Figure 1 – The sample file shown on a machine where the font Jeepers is installed

If we convert that DOCX to PDF on the machine where it was created (which has Jeepers installed), the resulting PDF looks just like the Word document.

Blog image

Figure 2 – The PDF created from the example DOCX file on a machine where Jeepers is installed

So far so good!

However, if that exact same DOCX file is opened on a machine where Jeepers is not available, Word will substitute it with a font that is available.

Blog image

Figure 3 – The same DOCX file, opened on a machine where the font Jeepers was not available. It has been replaced with a different font.

If you are curious about what the new font is, you can dig down into File > Options > Advanced > Font substitution in Word. In this example, Calibri has been substituted for Jeepers.

The substitution mechanism used in Word is complex, subject to change at any time, and can also be overridden by the user. It may even vary from one version of Office to another, or if you use an alternative word processor. As such, there is no single “truth” as far as font substitution is concerned.

When Apryse SDK converts an Office document to PDF it will also need to make a substitution. It might use a different font from the one that Word would, since there may be different sets of available fonts and substitution logic can differ.

This means that a PDF could look different from how Word would display that document on the same machine. This isn’t wrong, just different. Both mechanisms are substituting the original for an alternative font. In fact, if we look at the PDF generated by Apryse, we find that Arial MT has been substituted for Jeepers, which is arguably a better match to the original than Word’s choice. (The length of the text lines in the PDF created by Apryse is very similar to those in DOCX file on the original machine.)

Blog image

Figure 4 – The PDF created from the DOCX file using Apryse SDK on a machine where the font Jeepers is not installed

Recommendations for Preventing Font Substitution

Copied to clipboard

There are three options:

  1. Install the font on the machine that’s converting from Office to PDF.
  2. When creating the DOCX file, embed the fonts into the Word document itself. There is an article about how to set up font embedding in Word which is aimed at Fluent users, but the technique is applicable for anyone that uses Word. The downside is that not all fonts can be embedded due to licensing restrictions. Plus, embedding fonts can result in a significant increase in file size – without fonts the DOCX is 5 KB, and with embedded fonts it is 4,500 KB.
  3. Use the Apryse substitute font package with either the default fonts, or, if served locally, with your custom fonts (like Jeepers). This system allows fonts to be accessed from a folder without the need to install them.

Typical usage is shown below.

// The fonts can be served locally, or via the default hosted by Apryse	

WebFontDownloader::SetCustomWebFontURL([Path to /SelfServeWebFontsV2/]);

// Start with a PDFDoc (the conversion destination)

PDFDoc pdfdoc;

// perform the conversion with no optional parameters

Convert::OfficeToPDF(pdfdoc, input_path + input_filename, NULL);

// save the result

pdfdoc.Save(output_path + output_filename, SDF::SDFDoc::e_linearized, NULL);
Blog image

Figure 5 – An example of how the self-serve font package can result in a great result

Having generated the PDF, the issues with fonts may still not be over. When the PDF is viewed, further font substitutions may occur. This can be problematic on systems with few available fonts, like mobile devices.

Font Embedding and its Impact on PDF Appearance

Copied to clipboard

We briefly mentioned font embedding above, and it is particularly relevant to how fonts are handled within PDFs.

Embedding fonts in a PDF ensures that the document looks the same regardless of the system or viewer on which it is opened. Having the PDF look the same for all users is particularly crucial for PDF/A and PDF/UA. As such, font embedding is a requirement for compliance with those standards.

PDFs with Embedded Fonts

Creating a PDF with embedded fonts is the most reliable way to ensure that the document looks the same on every system. This method includes the actual fonts used in the document within the PDF file itself. (Often, only the characters used within the PDF are embedded to minimize file size.) As a result, the shape and size of the text will be consistent, independent of the fonts available on the viewer’s system.

The Apryse SDK automatically embeds fonts if they are available, but other software packages may not.

When fonts are not embedded, a look-up within the PDF viewer must occur. This can lead to subtle, or sometimes not-so-subtle, changes in appearance.

Using Base-14 Fonts

PDFs can be generated using Base-14 fonts. This is a standard set of fonts that PDF viewers are expected to provide by default (e.g., Times New Roman, Helvetica, Courier). While this approach aims for consistency across different viewers and systems, slight variations can still occur. This is because multiple versions of these fonts exist, sometimes with subtle variations, and what you see depends on the version that viewer has access to.

Using Non-Embedded, Non-Base-14 Fonts

If a PDF uses fonts other than Base-14 but does not embed them, then font substitution will occur.

Which font is chosen depends on various factors, including which fonts are available on the machine where the PDF is being viewed. This means the same PDF can look entirely different on different machines. Just as with viewing the DOCX file, font changes within the PDF can result in changes to character widths. This, in turn, can result in changes to line length, which can cause the entire page to render differently.

Apryse has a solution to this problem – the self-serve font package, which allows a consistent set of fonts to be used when viewing PDFs, and even has the ability for additional fonts to be added. (This is the same mechanism, used in a different way, that was described above in the section about creating PDFs.)

The Implications of Font Embedding

Copied to clipboard

The choice between embedding fonts or relying on system-available or viewer-provided fonts impacts the document’s portability and consistency:

Portability

Embedded fonts increase the file size but make the document more portable as it doesn't rely on the fonts available on the viewer’s system.

Consistency

Non-embedded fonts might lead to a more lightweight file, but at the cost of consistency in appearance across different systems and viewers.

Standard Compliance

For certain standards and archiving purposes (like PDF/A), embedding fonts is not just a preference but a requirement.

Conclusion

Copied to clipboard

Font handling in the conversion from DOCX to PDF is more than just a technicality; it’s about ensuring that the document communicates as intended, regardless of where and how it’s viewed. Understanding these nuances helps you make informed decisions about how to handle fonts during the conversion process, ensuring that the final document meets both aesthetic and practical requirements.

It is a complex subject, so check out the documentation for converting Office documents to PDF. If you need further assistance, feel free to reach out to us on Discord.

One last thing – Office to PDF conversion is just a small aspect of the Apryse SDK. It also offers the ability to edit and annotate PDFs, convert PDFs into Office documents, implement redaction, and much more.

Sanity Image

Matt Binsfeld

Sanity Image

Roger Dunham

Share this post

email
linkedIn
twitter