Available Now: Explore our latest release with enhanced accessibility and powerful IDP features

How to Create an Office Conversion Service in Python With Apryse Conversion SDK

By Isaac Maw | 2025 Mar 11

Sanity Image
Read time

7 min

Summary: PDF and Office both provide useful file formats for different situations, but converting files between them can be frustrating. This article provides sample code and examples of how to use the Apryse PDF conversion workflow to convert PDF to .docx, .xslx and .pptx in Python applications.

Microsoft Office is a quintessential and useful suite of apps for creating and editing spreadsheets, documents and slide decks. Office remains a go-to suite of tools for working with business information, and it’s important for many users to be able to work in these familiar formats.

On the other hand, PDF documents offer many benefits compared to the .docx, .pptx, and .xslx files associated with Microsoft Office. PDF is designed to present documents consistently across operating systems and applications, with formatting preserved. In addition to this fixed presentation, PDFs are also compressible and can be equipped with security features such as encryption, redaction and digital signatures.

So, while converting from Office to PDF is often as easy as a click of the ‘save’ button in an Office app, PDF to word conversion workflows can help bridge the gap to get PDF files back into a familar app.

PDF documents aren’t designed to be computer-readable, so it can be challenging to find a PDF to office conversion tool that preserves formatting accurately. With our PDF Conversion SDK, you can create a PDF conversion workflow in your Python application.

PDF to Office Conversion

Copied to clipboard

Here’s a step-by-step guide to using the PDF Conversion SDK to convert PDF to Microsoft Office, including Word, Excel or PowerPoint on Server or Desktop using Python.

This functionality is provided by an add-on to the Apryse Server SDK, called the Structured Output Module. 

Setup

  1. Download the Structured Output Module that allows PDF to Office conversion.
  2. Place it in the directory of your project, in a folder called lib and then reference it in the below sample.

Python PDF to Word Conversion

This sample demonstrates how to convert from a PDF to DOCX file:

wordOutputOptions = WordOutputOptions()  
# Optionally convert only the first page  
wordOutputOptions.SetPages(1, 1)  
# Requires the Structured Output module  
Convert.ToWord(filename, output_filename, wordOutputOptions) 
Python PDF to PowerPoint Conversion 
powerPointOutputOptions = PowerPointOutputOptions()  
# Optionally convert only the first page  
powerPointOutputOptions.SetPages(1, 1)  
# Requires the Structured Output module  
Convert.ToPowerPoint(filename, output_filename, powerPointOutputOptions) 

PDF to Excel

excelOutputOptions = ExcelOutputOptions()  
# Optionally convert only the first page  
excelOutputOptions.SetPages(1, 1)  
# Requires the Structured Output module  
Convert.ToExcel(filename, output_filename, excelOutputOptions) 

Full Sample Code

This longer sample code snippet shows how to use Apryse SDK to programmatically convert generic PDF documents to Word, Excel, and PowerPoint, provided in Python.

#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------

import site
site.addsitedir("../../../PDFNetC/Lib")
import sys
from PDFNetPython import *

import platform

sys.path.append("../../LicenseKey/PYTHON")
from LicenseKey import *

#---------------------------------------------------------------------------------------
# The following sample illustrates how to use the PDF.Convert utility class to convert 
# documents and files to Word, Excel and PowerPoint.
#
# The Structured Output module is an optional PDFNet Add-on that can be used to convert PDF
# and other documents into Word, Excel, PowerPoint and HTML format.
#
# The PDFTron SDK Structured Output module can be downloaded from
# https://docs.apryse.com/core/info/modules/
#
# Please contact us if you have any questions.
#---------------------------------------------------------------------------------------

# Relative path to the folder containing the test files.
inputPath = "../../TestFiles/"
outputPath = "../../TestFiles/Output/"

def main():
    # The first step in every application using PDFNet is to initialize the 
    # library. The library is usually initialized only once, but calling 
    # Initialize() multiple times is also fine.
    PDFNet.Initialize(LicenseKey)
    
    PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/")

    if not StructuredOutputModule.IsModuleAvailable():
        print("")
        print("Unable to run the sample: PDFTron SDK Structured Output module not available.")
        print("-----------------------------------------------------------------------------")
        print("The Structured Output module is an optional add-on, available for download")
        print("at https://docs.apryse.com/core/info/modules/. If you have already")
        print("downloaded this module, ensure that the SDK is able to find the required files")
        print("using the PDFNet::AddResourceSearchPath() function.")
        print("")
        return

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Word
        print("Converting PDF to Word")

        outputFile = outputPath + "paragraphs_and_tables.docx"

        Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Word, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Word with options
        print("Converting PDF to Word with options")

        outputFile = outputPath + "paragraphs_and_tables_first_page.docx"

        wordOutputOptions = WordOutputOptions()

        # Convert only the first page
        wordOutputOptions.SetPages(1, 1)

        Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile, wordOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Word, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Excel
        print("Converting PDF to Excel")

        outputFile = outputPath + "paragraphs_and_tables.xlsx"

        Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Excel, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Excel with options
        print("Converting PDF to Excel with options")

        outputFile = outputPath + "paragraphs_and_tables_second_page.xlsx"

        excelOutputOptions = ExcelOutputOptions()

        # Convert only the second page
        excelOutputOptions.SetPages(2, 2)

        Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile, excelOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Excel, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to PowerPoint
        print("Converting PDF to PowerPoint")

        outputFile = outputPath + "paragraphs_and_tables.pptx"

        Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to PowerPoint, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to PowerPoint with options
        print("Converting PDF to PowerPoint with options")

        outputFile = outputPath + "paragraphs_and_tables_first_page.pptx"

        powerPointOutputOptions = PowerPointOutputOptions()

        # Convert only the first page
        powerPointOutputOptions.SetPages(1, 1)

        Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile, powerPointOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to PowerPoint, error: " + str(e))

    #-----------------------------------------------------------------------------------

    PDFNet.Terminate()
    print("Done.")
    
if __name__ == '__main__':
    main()

PDF to Office Conversion SDK Benefits

Copied to clipboard

As discussed at the top of this article, not all conversion tools can accurately parse a PDF file and preserve formatting during the conversion process.

Our SDK provides better results with the following benefits:

Client-side processing

Scale easily without any server-side dependencies like Microsoft Office or LibreOffice for rendering, conversion, or editing PDFs, Microsoft Office, images, videos, and HTML.

Unparalleled Rendering Quality

Bring fast rendering and leading accuracy conversion of Office documents to any web, mobile, or desktop application.

Secure By Design

No outside dependencies means you can deploy on your own infrastructure without data ever leaving your platform to eliminate vulnerabilities.

Expert and Reliable Support

Accelerate projects with our team of experienced SDK developers there to support you through your unlimited trial to the finish line and beyond.

The Complete Office and Document SDK

Copied to clipboard

If your users need a quick, reliable way to get PDFs into a familiar Microsoft Office format that they need to get things done, this is the solution.

In addition to conversion, our Server SDK is designed to grow with your needs. Easily add out-of-the-box components for client-side document viewing, annotating, and many other document capabilities, for 160+ file formats on any platform.

To find out more about SDK capabilities, connect with us. Or, check out our documentation to see for yourself.

 

Sanity Image

Isaac Maw

Technical Content Creator

Share this post

email
linkedIn
twitter