Available Now: Explore our latest release with enhanced accessibility and powerful IDP features
By Matt Binsfeld, Roger Dunham | 2024 Apr 12
7 min
Tags
font
font substitution
docx to pdf
office conversion
The Apryse SDK enables you to create a PDF from an Office document without the need to install Office. Not only does this remove the need for an Office software license, but it also means that conversion can occur on platforms where Office is not supported – for example, on Linux.
When converting a DOCX to PDF, you generally expect that the resulting file should look the same as the Office document, with line and page breaks all being in the correct location. However, one critical aspect that often gets overlooked is the handling of fonts.
Font handling can significantly impact how the document appears in its final form, since fonts affect not just the look of the words, but also their size and potentially the number of words that fit onto a line. Changing the font can cause paragraphs to reflow, and text to even be pushed onto the following page.
In this article we will delve into the world of font substitution, embedding, and how they affect PDF output.
For a font to be accurately represented in a PDF, it must be available at the time of PDF generation. If the required font is missing, an available substitute font will be used.
As an example, let’s look at a Word document that was created on a machine that has a specific font called Jeepers installed (it also includes text in two other fonts).
Figure 1 – The sample file shown on a machine where the font Jeepers is installed
If we convert that DOCX to PDF on the machine where it was created (which has Jeepers installed), the resulting PDF looks just like the Word document.
Figure 2 – The PDF created from the example DOCX file on a machine where Jeepers is installed
So far so good!
However, if that exact same DOCX file is opened on a machine where Jeepers is not available, Word will substitute it with a font that is available.
Figure 3 – The same DOCX file, opened on a machine where the font Jeepers was not available. It has been replaced with a different font.
If you are curious about what the new font is, you can dig down into File > Options > Advanced > Font substitution in Word. In this example, Calibri has been substituted for Jeepers.
The substitution mechanism used in Word is complex, subject to change at any time, and can also be overridden by the user. It may even vary from one version of Office to another, or if you use an alternative word processor. As such, there is no single “truth” as far as font substitution is concerned.
When Apryse SDK converts an Office document to PDF it will also need to make a substitution. It might use a different font from the one that Word would, since there may be different sets of available fonts and substitution logic can differ.
This means that a PDF could look different from how Word would display that document on the same machine. This isn’t wrong, just different. Both mechanisms are substituting the original for an alternative font. In fact, if we look at the PDF generated by Apryse, we find that Arial MT has been substituted for Jeepers, which is arguably a better match to the original than Word’s choice. (The length of the text lines in the PDF created by Apryse is very similar to those in DOCX file on the original machine.)
Figure 4 – The PDF created from the DOCX file using Apryse SDK on a machine where the font Jeepers is not installed
There are three options:
Typical usage is shown below.
// The fonts can be served locally, or via the default hosted by Apryse
WebFontDownloader::SetCustomWebFontURL([Path to /SelfServeWebFontsV2/]);
// Start with a PDFDoc (the conversion destination)
PDFDoc pdfdoc;
// perform the conversion with no optional parameters
Convert::OfficeToPDF(pdfdoc, input_path + input_filename, NULL);
// save the result
pdfdoc.Save(output_path + output_filename, SDF::SDFDoc::e_linearized, NULL);
Figure 5 – An example of how the self-serve font package can result in a great result
Having generated the PDF, the issues with fonts may still not be over. When the PDF is viewed, further font substitutions may occur. This can be problematic on systems with few available fonts, like mobile devices.
We briefly mentioned font embedding above, and it is particularly relevant to how fonts are handled within PDFs.
Embedding fonts in a PDF ensures that the document looks the same regardless of the system or viewer on which it is opened. Having the PDF look the same for all users is particularly crucial for PDF/A and PDF/UA. As such, font embedding is a requirement for compliance with those standards.
Creating a PDF with embedded fonts is the most reliable way to ensure that the document looks the same on every system. This method includes the actual fonts used in the document within the PDF file itself. (Often, only the characters used within the PDF are embedded to minimize file size.) As a result, the shape and size of the text will be consistent, independent of the fonts available on the viewer’s system.
The Apryse SDK automatically embeds fonts if they are available, but other software packages may not.
When fonts are not embedded, a look-up within the PDF viewer must occur. This can lead to subtle, or sometimes not-so-subtle, changes in appearance.
PDFs can be generated using Base-14 fonts. This is a standard set of fonts that PDF viewers are expected to provide by default (e.g., Times New Roman, Helvetica, Courier). While this approach aims for consistency across different viewers and systems, slight variations can still occur. This is because multiple versions of these fonts exist, sometimes with subtle variations, and what you see depends on the version that viewer has access to.
If a PDF uses fonts other than Base-14 but does not embed them, then font substitution will occur.
Which font is chosen depends on various factors, including which fonts are available on the machine where the PDF is being viewed. This means the same PDF can look entirely different on different machines. Just as with viewing the DOCX file, font changes within the PDF can result in changes to character widths. This, in turn, can result in changes to line length, which can cause the entire page to render differently.
Apryse has a solution to this problem – the self-serve font package, which allows a consistent set of fonts to be used when viewing PDFs, and even has the ability for additional fonts to be added. (This is the same mechanism, used in a different way, that was described above in the section about creating PDFs.)
The choice between embedding fonts or relying on system-available or viewer-provided fonts impacts the document’s portability and consistency:
Portability
Embedded fonts increase the file size but make the document more portable as it doesn't rely on the fonts available on the viewer’s system.
Consistency
Non-embedded fonts might lead to a more lightweight file, but at the cost of consistency in appearance across different systems and viewers.
Standard Compliance
For certain standards and archiving purposes (like PDF/A), embedding fonts is not just a preference but a requirement.
Font handling in the conversion from DOCX to PDF is more than just a technicality; it’s about ensuring that the document communicates as intended, regardless of where and how it’s viewed. Understanding these nuances helps you make informed decisions about how to handle fonts during the conversion process, ensuring that the final document meets both aesthetic and practical requirements.
It is a complex subject, so check out the documentation for converting Office documents to PDF. If you need further assistance, feel free to reach out to us on Discord.
One last thing – Office to PDF conversion is just a small aspect of the Apryse SDK. It also offers the ability to edit and annotate PDFs, convert PDFs into Office documents, implement redaction, and much more.
Tags
font
font substitution
docx to pdf
office conversion
Matt Binsfeld
Roger Dunham
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content