How to Create an Office Conversion Service in Python With Apryse Conversion SDK

By Isaac Maw | 2025 Mar 11

7 min

PDF to Office Conversion

Copied to clipboard

Here’s a step-by-step guide to using the PDF Conversion SDK to convert PDF to Microsoft Office, including Word, Excel or PowerPoint on Server or Desktop using Python.

This functionality is provided by an add-on to the Apryse Server SDK, called the Structured Output Module.

Setup

Download the Structured Output Module that allows PDF to Office conversion.
Place it in the directory of your project, in a folder called lib and then reference it in the below sample.

Python PDF to Word Conversion

This sample demonstrates how to convert from a PDF to DOCX file:

wordOutputOptions = WordOutputOptions()  
# Optionally convert only the first page  
wordOutputOptions.SetPages(1, 1)  
# Requires the Structured Output module  
Convert.ToWord(filename, output_filename, wordOutputOptions) 
Python PDF to PowerPoint Conversion 
powerPointOutputOptions = PowerPointOutputOptions()  
# Optionally convert only the first page  
powerPointOutputOptions.SetPages(1, 1)  
# Requires the Structured Output module  
Convert.ToPowerPoint(filename, output_filename, powerPointOutputOptions)

PDF to Excel

excelOutputOptions = ExcelOutputOptions()  
# Optionally convert only the first page  
excelOutputOptions.SetPages(1, 1)  
# Requires the Structured Output module  
Convert.ToExcel(filename, output_filename, excelOutputOptions)

Full Sample Code

This longer sample code snippet shows how to use Apryse SDK to programmatically convert generic PDF documents to Word, Excel, and PowerPoint, provided in Python.

#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------

import site
site.addsitedir("../../../PDFNetC/Lib")
import sys
from PDFNetPython import *

import platform

sys.path.append("../../LicenseKey/PYTHON")
from LicenseKey import *

#---------------------------------------------------------------------------------------
# The following sample illustrates how to use the PDF.Convert utility class to convert 
# documents and files to Word, Excel and PowerPoint.
#
# The Structured Output module is an optional PDFNet Add-on that can be used to convert PDF
# and other documents into Word, Excel, PowerPoint and HTML format.
#
# The PDFTron SDK Structured Output module can be downloaded from
# https://docs.apryse.com/core/info/modules/
#
# Please contact us if you have any questions.
#---------------------------------------------------------------------------------------

# Relative path to the folder containing the test files.
inputPath = "../../TestFiles/"
outputPath = "../../TestFiles/Output/"

def main():
    # The first step in every application using PDFNet is to initialize the 
    # library. The library is usually initialized only once, but calling 
    # Initialize() multiple times is also fine.
    PDFNet.Initialize(LicenseKey)
    
    PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/")

    if not StructuredOutputModule.IsModuleAvailable():
        print("")
        print("Unable to run the sample: PDFTron SDK Structured Output module not available.")
        print("-----------------------------------------------------------------------------")
        print("The Structured Output module is an optional add-on, available for download")
        print("at https://docs.apryse.com/core/info/modules/. If you have already")
        print("downloaded this module, ensure that the SDK is able to find the required files")
        print("using the PDFNet::AddResourceSearchPath() function.")
        print("")
        return

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Word
        print("Converting PDF to Word")

        outputFile = outputPath + "paragraphs_and_tables.docx"

        Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Word, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Word with options
        print("Converting PDF to Word with options")

        outputFile = outputPath + "paragraphs_and_tables_first_page.docx"

        wordOutputOptions = WordOutputOptions()

        # Convert only the first page
        wordOutputOptions.SetPages(1, 1)

        Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile, wordOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Word, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Excel
        print("Converting PDF to Excel")

        outputFile = outputPath + "paragraphs_and_tables.xlsx"

        Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Excel, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to Excel with options
        print("Converting PDF to Excel with options")

        outputFile = outputPath + "paragraphs_and_tables_second_page.xlsx"

        excelOutputOptions = ExcelOutputOptions()

        # Convert only the second page
        excelOutputOptions.SetPages(2, 2)

        Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile, excelOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to Excel, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to PowerPoint
        print("Converting PDF to PowerPoint")

        outputFile = outputPath + "paragraphs_and_tables.pptx"

        Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to PowerPoint, error: " + str(e))

    #-----------------------------------------------------------------------------------

    try:
        # Convert PDF document to PowerPoint with options
        print("Converting PDF to PowerPoint with options")

        outputFile = outputPath + "paragraphs_and_tables_first_page.pptx"

        powerPointOutputOptions = PowerPointOutputOptions()

        # Convert only the first page
        powerPointOutputOptions.SetPages(1, 1)

        Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile, powerPointOutputOptions)

        print("Result saved in " + outputFile)
    except Exception as e:
        print("Unable to convert PDF document to PowerPoint, error: " + str(e))

    #-----------------------------------------------------------------------------------

    PDFNet.Terminate()
    print("Done.")
    
if __name__ == '__main__':
    main()

PDF to Office Conversion SDK Benefits

Copied to clipboard

As discussed at the top of this article, not all conversion tools can accurately parse a PDF file and preserve formatting during the conversion process.

Our SDK provides better results with the following benefits:

Client-side processing

Scale easily without any server-side dependencies like Microsoft Office or LibreOffice for rendering, conversion, or editing PDFs, Microsoft Office, images, videos, and HTML.

Unparalleled Rendering Quality

Bring fast rendering and leading accuracy conversion of Office documents to any web, mobile, or desktop application.

Secure By Design

No outside dependencies means you can deploy on your own infrastructure without data ever leaving your platform to eliminate vulnerabilities.

Expert and Reliable Support

Accelerate projects with our team of experienced SDK developers there to support you through your unlimited trial to the finish line and beyond.

The Complete Office and Document SDK

Copied to clipboard

If your users need a quick, reliable way to get PDFs into a familiar Microsoft Office format that they need to get things done, this is the solution.

In addition to conversion, our Server SDK is designed to grow with your needs. Easily add out-of-the-box components for client-side document viewing, annotating, and many other document capabilities, for 160+ file formats on any platform.

To find out more about SDK capabilities, connect with us. Or, check out our documentation to see for yourself.