Available Now: Explore our latest release with enhanced accessibility and powerful IDP features
By Apryse | 2023 Sep 04
6 min
Tags
office conversion
python
docx to pdf
xlsx
pptx to pdf
In the modern business landscape, sharing and archiving documents in a reliable and universally accessible format is paramount. Portable Document Format (PDF) stands out as a versatile and widely supported format that ensures consistent presentation across different platforms. Converting Office documents such as Word, Excel, and PowerPoint files into PDFs is a common requirement.
The Apryse PDF SDK can be used with multiple languages, including Python 2 and Python 3, to convert Office Documents to PDF, with both client-side (within the browser) and server-side conversions.
In this blog we will walk through getting started with sample code for converting a Word document to PDF, then extend it to see some of the other functionality that is available. The Python SDK and samples can all be downloaded from the Apryse website, and I will provide links later in this article.
I will be using Python 3 on Windows, but the library is also available for Linux and macOS.
If you have Office installed on your own machine, then creating a PDF is simple.
But what do you do if you don’t have Office installed?
Theoretically a server could be set up that uses Office to perform the conversions within the Web server backend, but there are several issues with that:
The awesome thing about the Apryse solution is that there is no need for Office to be installed, either on the user’s machine, or on a server, for the conversion to occur.
That’s right. The conversion occurs entirely without the need for Office.
The sample project is intended to show how specific documents, in a hard coded location, can be converted to PDF.
In reality you are likely to want to be able to specify which document is to be converted, and what you want to do with the PDF once the conversion is complete. As such the code should be considered to be an example of how to convert a file and see the result, rather than as a template of how to write an entire document processing solution.
Getting started guides can be found for Python 2 and Python 3. These pages include information for Windows, macOS and Linux. There is also information available about using the Apryse SDK for Python 3 within AWS Lambda Functions here.
The prerequisites depend on the version of Python and the platform.
For Python 3 and Windows you will need the Python 3.5 or later, and pip installed. For other options please see the documentation at the links above.
You will also need a trial license key that can be downloaded from the Apryse website.
If you don't already have an Apryse account, go to https://dev.apryse.com and register a new account. This allows Apryse to grant you a demo license key which will be used with the Apryse SDK to enable demo functionality.
Figure 1- The Developer Portal
Log into https://dev.apryse.com with your registered account and you will be able to get your license key.
Figure 2 - Where to find your license key. Note that the selected platform may affect the license key.
python -m pip install apryse-sdk --extra-index-url=https://pypi.apryse.com
Figure 3 - Output when installing the Python3 SDK on a Windows machine.
In my case these were in a zip file called PDFNetPython3.zip, and I extracted them into a folder where I will write my code.
Figure 4 - The sample projects folder, after extraction.
Opening the folder shows that there are many samples inside. We are primarily interested in the OfficeToPDFTest sample.
Figure 5 - Some of the contents of the samples projects folder.
Before you can proceed with the samples you will need to an Apryse Trial key. If you haven’t already downloaded one, then you will find it on the sample page.
I used VSCode, but you can use whatever editor you prefer.
Figure 6 - Adding the Trial key data to the LicenseKey.py file.
The sample that we are going to look at is in a file called PDF2OfficeTest.py
Figure 7 - Location of the sample file that we are going to use.
The sample code that we are going to use specifies the files that should be converted from Office to PDF based on hardcoded locations within the archive that we downloaded. Before we look at the Python code or run it, let’s have a look at one of the Word document files in the Samples folder.
Figure 8- the first page of one of the Word documents that will be converted by the sample code.
• Modify the code so that it uses this file. Open the file OfficeToPDFtest.py, scroll down to the bottom and update the name of the file to be converted.
Figure 9 - The code ready to be executed.
We are now ready to run the conversion. The RunTest.bat file adds some useful logging to indicate if the SDK is not available but it is not essential.
• Run the sample by using
python OfficeToPDFTest.py
After a few seconds output will appear in the terminal
Figure 10 - Typical output when running the OfficeToPDF sample.
Figure 11 - The contents of the output folder after the sample has completed. Note the ‘empty’ file is a placeholder only.
Opening each new file in turn will show that we have created PDFs from the original documents.
Figure 12 – A simple Word document now a PDF. I’ve used an Apryse WebViewer based web page, but you can use any PDF viewer.
Figure 13 - Another PDF that was created from a 31 page Word document.
You could now use this code as the basis for the back-end to a website, perhaps one using php, with the Word document being uploaded, and the converted PDF being returned.
There are two different functions in the sample:
SimpleDocxConvert and FlexibleDocxConvert
In this blog I will just look at the simpler method.
def SimpleDocxConvert(input_filename, output_filename):
# Start with a PDFDoc (the conversion destination)
pdfdoc = PDFDoc()
# perform the conversion with no optional parameters
Convert.OfficeToPDF(pdfdoc, input_path + input_filename, None)
# save the result
pdfdoc.Save(output_path + output_filename, SDFDoc.e_linearized)
# And we're done!
print("Saved " + output_filename )
Of this code only a few lines are doing any real work, the rest is logging and comments.
In fact, you can see that the Word document is converted to a PDF in just three lines of code.
Talk about simply getting a great result!
I will write more about the function FlexibleDocxConvert in a later blog, but, for now, it is enough to say that it illustrates how the code can be used with various options, or within a multithreaded environment to monitor and cancel conversions.
Absolutely.
In addition to converting from Word, the SDK can convert from Excel and PowerPoint to PDF, and even supports legacy Office formats: .doc, .xls and .ppt.
At its simplest, all that needs to be done to convert from other Office document types is to include the file extension when passing the file to the conversion method.
It’s worth noting that the function SimpleDocxConvert isn’t a great name, as it suggests that the Apryse SDK is less powerful than it really is.
For example, an Excel Spreadsheet can be converted into a multiple page PDF with each page laid out in the same way as the original spreadsheet, by copying the file to the TestFiles folder then using:
SimpleDocxConvert("Cashflow.xlsx", "Cashflow.pdf")
The SDK is clever enough to know that .xlsx means that a conversion from Excel to PDF is required.
Figure 14: An example Excel spreadsheet now converted to a PDF, shown within the Apryse WebViewer
In the same way, PowerPoint presentations can be converted into multi-page PDFs using:
SimpleDocxConvert("WW1Cryptography.pptx", "WW1Cryptography.pdf")
Figure 15: A PowerPoint presentation now converted to a PDF, once again within the Apryse WebViewer.
Apryse offers a simple mechanism for converting Office documents, presentations and spreadsheets to PDF without the need for Office to be installed. This can be done with just a few lines of code that use default options. More complex options exist to allow the conversion mechanism to be tailored to your requirements.
These powerful conversion capabilities, coupled with the ease of integration provided by its Python library, make the Apryse SDK the best choice for developers aiming to enhance their document processing workflows. Whether you're building a document management system, an online collaboration platform, or any other application involving Office documents, Apryse can help you provide a seamless and efficient conversion process.
In addition to converting Office documents to PDF, Apryse offers many tools for editing and handling both Office Documents and PDFs, including converting PDFs into Office documents.
When you want to see this code in action, the website https://xodo.com uses the SDK for creating PDFs from Word documents, Excel spreadsheets and PowerPoint presentations. When you are ready to get started see the documentation for the SDK to get started quickly. Don’t forget, you can also reach out to us on Discord if you have any issues.
Tags
office conversion
python
docx to pdf
xlsx
pptx to pdf
Apryse
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content