Available Now: Explore our latest release with enhanced accessibility and powerful IDP features

How to Convert Office Documents to PDF on a Server Without Installing Office using the Apryse SDK and Python

By Apryse | 2023 Sep 04

Sanity Image
Read time

6 min

Introduction

Copied to clipboard

In the modern business landscape, sharing and archiving documents in a reliable and universally accessible format is paramount. Portable Document Format (PDF) stands out as a versatile and widely supported format that ensures consistent presentation across different platforms. Converting Office documents such as Word, Excel, and PowerPoint files into PDFs is a common requirement.

The Apryse PDF SDK can be used with multiple languages, including Python 2 and Python 3, to convert Office Documents to PDF, with both client-side (within the browser) and server-side conversions.

In this blog we will walk through getting started with sample code for converting a Word document to PDF, then extend it to see some of the other functionality that is available. The Python SDK and samples can all be downloaded from the Apryse website, and I will provide links later in this article.

I will be using Python 3 on Windows, but the library is also available for Linux and macOS.

Why Bother? Surely Creating a PDF is Easy, You Can Just Use 'Export from Word'

Copied to clipboard

If you have Office installed on your own machine, then creating a PDF is simple.

But what do you do if you don’t have Office installed?

Theoretically a server could be set up that uses Office to perform the conversions within the Web server backend, but there are several issues with that:

  • Some versions of Office are not recommended to be used as a service component.
  • Office sometimes brings up modal user dialogs, particularly when an update is needed, and since services have no UI, the dialog can never be dismissed, and the application will hang.
  • Licensing of Office in this way is not the same as for a single user Desktop machine, so there is additional complexity in using this solution in a legal, licensed way.

The awesome thing about the Apryse solution is that there is no need for Office to be installed, either on the user’s machine, or on a server, for the conversion to occur.

That’s right. The conversion occurs entirely without the need for Office.

Sample Project for Converting a Document to PDF

Copied to clipboard

The sample project is intended to show how specific documents, in a hard coded location, can be converted to PDF.

In reality you are likely to want to be able to specify which document is to be converted, and what you want to do with the PDF once the conversion is complete. As such the code should be considered to be an example of how to convert a file and see the result, rather than as a template of how to write an entire document processing solution.

Getting started guides can be found for Python 2 and Python 3. These pages include information for Windows, macOS and Linux. There is also information available about using the Apryse SDK for Python 3 within AWS Lambda Functions here.

Prerequisites

Copied to clipboard

The prerequisites depend on the version of Python and the platform.

For Python 3 and Windows you will need the Python 3.5 or later, and pip installed. For other options please see the documentation at the links above.

You will also need a trial license key that can be downloaded from the Apryse website.

How to Get an Apryse SDK Trial Key

Copied to clipboard

If you don't already have an Apryse account, go to https://dev.apryse.com and register a new account. This allows Apryse to grant you a demo license key which will be used with the Apryse SDK to enable demo functionality.

Blog image

Figure 1- The Developer Portal

Log into https://dev.apryse.com with your registered account and you will be able to get your license key.

Blog image

Figure 2 - Where to find your license key. Note that the selected platform may affect the license key.

How to get the Apryse SDK for Python

Copied to clipboard
  • Install the SDK using pip by entering
python -m pip install apryse-sdk --extra-index-url=https://pypi.apryse.com

Blog image

Figure 3 - Output when installing the Python3 SDK on a Windows machine.

In my case these were in a zip file called PDFNetPython3.zip, and I extracted them into a folder where I will write my code.

Blog image

Figure 4 - The sample projects folder, after extraction.

Opening the folder shows that there are many samples inside. We are primarily interested in the OfficeToPDFTest sample.

Blog image

Figure 5 - Some of the contents of the samples projects folder.

Setting Up Your Project:

Copied to clipboard

Before you can proceed with the samples you will need to an Apryse Trial key. If you haven’t already downloaded one, then you will find it on the sample page.

  • Enter the Trial key into the file Samples/LicenseKey/PYTHON/LicenseKey.py.

I used VSCode, but you can use whatever editor you prefer.

Blog image

Figure 6 - Adding the Trial key data to the LicenseKey.py file.

The sample that we are going to look at is in a file called PDF2OfficeTest.py

Blog image

Figure 7 - Location of the sample file that we are going to use.

The sample code that we are going to use specifies the files that should be converted from Office to PDF based on hardcoded locations within the archive that we downloaded. Before we look at the Python code or run it, let’s have a look at one of the Word document files in the Samples folder.

Blog image

Figure 8- the first page of one of the Word documents that will be converted by the sample code.

• Modify the code so that it uses this file. Open the file OfficeToPDFtest.py, scroll down to the bottom and update the name of the file to be converted.

Blog image

Figure 9 - The code ready to be executed.

We are now ready to run the conversion. The RunTest.bat file adds some useful logging to indicate if the SDK is not available but it is not essential.

• Run the sample by using

python OfficeToPDFTest.py

After a few seconds output will appear in the terminal

Blog image

Figure 10 - Typical output when running the OfficeToPDF sample.

  • Now navigate to the Output folder within the samples library
Blog image

Figure 11 - The contents of the output folder after the sample has completed. Note the ‘empty’ file is a placeholder only.

Opening each new file in turn will show that we have created PDFs from the original documents.

Blog image

Figure 12 – A simple Word document now a PDF. I’ve used an Apryse WebViewer based web page, but you can use any PDF viewer.

Blog image

Figure 13 - Another PDF that was created from a 31 page Word document.

You could now use this code as the basis for the back-end to a website, perhaps one using php, with the Word document being uploaded, and the converted PDF being returned.

How Does the Sample Code Work?

Copied to clipboard

There are two different functions in the sample:

SimpleDocxConvert and FlexibleDocxConvert

In this blog I will just look at the simpler method.

def SimpleDocxConvert(input_filename, output_filename): 
    # Start with a PDFDoc (the conversion destination) 
    pdfdoc = PDFDoc() 

    # perform the conversion with no optional parameters 
    Convert.OfficeToPDF(pdfdoc, input_path + input_filename, None) 

    # save the result 
    pdfdoc.Save(output_path + output_filename, SDFDoc.e_linearized) 

    # And we're done! 
    print("Saved " + output_filename ) 

Of this code only a few lines are doing any real work, the rest is logging and comments.

In fact, you can see that the Word document is converted to a PDF in just three lines of code.

Talk about simply getting a great result!

I will write more about the function FlexibleDocxConvert in a later blog, but, for now, it is enough to say that it illustrates how the code can be used with various options, or within a multithreaded environment to monitor and cancel conversions.

Can the SDK do More than Convert just Word Documents to PDF?

Copied to clipboard

Absolutely.

In addition to converting from Word, the SDK can convert from Excel and PowerPoint to PDF, and even supports legacy Office formats: .doc, .xls and .ppt.

At its simplest, all that needs to be done to convert from other Office document types is to include the file extension when passing the file to the conversion method.

It’s worth noting that the function SimpleDocxConvert isn’t a great name, as it suggests that the Apryse SDK is less powerful than it really is.

For example, an Excel Spreadsheet can be converted into a multiple page PDF with each page laid out in the same way as the original spreadsheet, by copying the file to the TestFiles folder then using:

SimpleDocxConvert("Cashflow.xlsx", "Cashflow.pdf")

The SDK is clever enough to know that .xlsx means that a conversion from Excel to PDF is required.

Blog image

Figure 14: An example Excel spreadsheet now converted to a PDF, shown within the Apryse WebViewer

In the same way, PowerPoint presentations can be converted into multi-page PDFs using:

SimpleDocxConvert("WW1Cryptography.pptx", "WW1Cryptography.pdf")

Blog image

Figure 15: A PowerPoint presentation now converted to a PDF, once again within the Apryse WebViewer.

Conclusion

Copied to clipboard

Apryse offers a simple mechanism for converting Office documents, presentations and spreadsheets to PDF without the need for Office to be installed. This can be done with just a few lines of code that use default options. More complex options exist to allow the conversion mechanism to be tailored to your requirements.

These powerful conversion capabilities, coupled with the ease of integration provided by its Python library, make the Apryse SDK the best choice for developers aiming to enhance their document processing workflows. Whether you're building a document management system, an online collaboration platform, or any other application involving Office documents, Apryse can help you provide a seamless and efficient conversion process.

In addition to converting Office documents to PDF, Apryse offers many tools for editing and handling both Office Documents and PDFs, including converting PDFs into Office documents.

When you want to see this code in action, the website https://xodo.com uses the SDK for creating PDFs from Word documents, Excel spreadsheets and PowerPoint presentations. When you are ready to get started see the documentation for the SDK to get started quickly. Don’t forget, you can also reach out to us on Discord if you have any issues.

Sanity Image

Apryse

Share this post

email
linkedIn
twitter