Available Now: Explore our latest release with enhanced accessibility and powerful IDP features
By Isaac Maw | 2025 Mar 11
7 min
Tags
office conversion
pdf conversion
python
PDF SDK
Summary: PDF and Office both provide useful file formats for different situations, but converting files between them can be frustrating. This article provides sample code and examples of how to use the Apryse PDF conversion workflow to convert PDF to .docx, .xslx and .pptx in Python applications.
Microsoft Office is a quintessential and useful suite of apps for creating and editing spreadsheets, documents and slide decks. Office remains a go-to suite of tools for working with business information, and it’s important for many users to be able to work in these familiar formats.
On the other hand, PDF documents offer many benefits compared to the .docx, .pptx, and .xslx files associated with Microsoft Office. PDF is designed to present documents consistently across operating systems and applications, with formatting preserved. In addition to this fixed presentation, PDFs are also compressible and can be equipped with security features such as encryption, redaction and digital signatures.
So, while converting from Office to PDF is often as easy as a click of the ‘save’ button in an Office app, PDF to word conversion workflows can help bridge the gap to get PDF files back into a familar app.
PDF documents aren’t designed to be computer-readable, so it can be challenging to find a PDF to office conversion tool that preserves formatting accurately. With our PDF Conversion SDK, you can create a PDF conversion workflow in your Python application.
Here’s a step-by-step guide to using the PDF Conversion SDK to convert PDF to Microsoft Office, including Word, Excel or PowerPoint on Server or Desktop using Python.
This functionality is provided by an add-on to the Apryse Server SDK, called the Structured Output Module.
This sample demonstrates how to convert from a PDF to DOCX file:
wordOutputOptions = WordOutputOptions()
# Optionally convert only the first page
wordOutputOptions.SetPages(1, 1)
# Requires the Structured Output module
Convert.ToWord(filename, output_filename, wordOutputOptions)
Python PDF to PowerPoint Conversion
powerPointOutputOptions = PowerPointOutputOptions()
# Optionally convert only the first page
powerPointOutputOptions.SetPages(1, 1)
# Requires the Structured Output module
Convert.ToPowerPoint(filename, output_filename, powerPointOutputOptions)
excelOutputOptions = ExcelOutputOptions()
# Optionally convert only the first page
excelOutputOptions.SetPages(1, 1)
# Requires the Structured Output module
Convert.ToExcel(filename, output_filename, excelOutputOptions)
This longer sample code snippet shows how to use Apryse SDK to programmatically convert generic PDF documents to Word, Excel, and PowerPoint, provided in Python.
#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------
import site
site.addsitedir("../../../PDFNetC/Lib")
import sys
from PDFNetPython import *
import platform
sys.path.append("../../LicenseKey/PYTHON")
from LicenseKey import *
#---------------------------------------------------------------------------------------
# The following sample illustrates how to use the PDF.Convert utility class to convert
# documents and files to Word, Excel and PowerPoint.
#
# The Structured Output module is an optional PDFNet Add-on that can be used to convert PDF
# and other documents into Word, Excel, PowerPoint and HTML format.
#
# The PDFTron SDK Structured Output module can be downloaded from
# https://docs.apryse.com/core/info/modules/
#
# Please contact us if you have any questions.
#---------------------------------------------------------------------------------------
# Relative path to the folder containing the test files.
inputPath = "../../TestFiles/"
outputPath = "../../TestFiles/Output/"
def main():
# The first step in every application using PDFNet is to initialize the
# library. The library is usually initialized only once, but calling
# Initialize() multiple times is also fine.
PDFNet.Initialize(LicenseKey)
PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/")
if not StructuredOutputModule.IsModuleAvailable():
print("")
print("Unable to run the sample: PDFTron SDK Structured Output module not available.")
print("-----------------------------------------------------------------------------")
print("The Structured Output module is an optional add-on, available for download")
print("at https://docs.apryse.com/core/info/modules/. If you have already")
print("downloaded this module, ensure that the SDK is able to find the required files")
print("using the PDFNet::AddResourceSearchPath() function.")
print("")
return
#-----------------------------------------------------------------------------------
try:
# Convert PDF document to Word
print("Converting PDF to Word")
outputFile = outputPath + "paragraphs_and_tables.docx"
Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile)
print("Result saved in " + outputFile)
except Exception as e:
print("Unable to convert PDF document to Word, error: " + str(e))
#-----------------------------------------------------------------------------------
try:
# Convert PDF document to Word with options
print("Converting PDF to Word with options")
outputFile = outputPath + "paragraphs_and_tables_first_page.docx"
wordOutputOptions = WordOutputOptions()
# Convert only the first page
wordOutputOptions.SetPages(1, 1)
Convert.ToWord(inputPath + "paragraphs_and_tables.pdf", outputFile, wordOutputOptions)
print("Result saved in " + outputFile)
except Exception as e:
print("Unable to convert PDF document to Word, error: " + str(e))
#-----------------------------------------------------------------------------------
try:
# Convert PDF document to Excel
print("Converting PDF to Excel")
outputFile = outputPath + "paragraphs_and_tables.xlsx"
Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile)
print("Result saved in " + outputFile)
except Exception as e:
print("Unable to convert PDF document to Excel, error: " + str(e))
#-----------------------------------------------------------------------------------
try:
# Convert PDF document to Excel with options
print("Converting PDF to Excel with options")
outputFile = outputPath + "paragraphs_and_tables_second_page.xlsx"
excelOutputOptions = ExcelOutputOptions()
# Convert only the second page
excelOutputOptions.SetPages(2, 2)
Convert.ToExcel(inputPath + "paragraphs_and_tables.pdf", outputFile, excelOutputOptions)
print("Result saved in " + outputFile)
except Exception as e:
print("Unable to convert PDF document to Excel, error: " + str(e))
#-----------------------------------------------------------------------------------
try:
# Convert PDF document to PowerPoint
print("Converting PDF to PowerPoint")
outputFile = outputPath + "paragraphs_and_tables.pptx"
Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile)
print("Result saved in " + outputFile)
except Exception as e:
print("Unable to convert PDF document to PowerPoint, error: " + str(e))
#-----------------------------------------------------------------------------------
try:
# Convert PDF document to PowerPoint with options
print("Converting PDF to PowerPoint with options")
outputFile = outputPath + "paragraphs_and_tables_first_page.pptx"
powerPointOutputOptions = PowerPointOutputOptions()
# Convert only the first page
powerPointOutputOptions.SetPages(1, 1)
Convert.ToPowerPoint(inputPath + "paragraphs_and_tables.pdf", outputFile, powerPointOutputOptions)
print("Result saved in " + outputFile)
except Exception as e:
print("Unable to convert PDF document to PowerPoint, error: " + str(e))
#-----------------------------------------------------------------------------------
PDFNet.Terminate()
print("Done.")
if __name__ == '__main__':
main()
As discussed at the top of this article, not all conversion tools can accurately parse a PDF file and preserve formatting during the conversion process.
Our SDK provides better results with the following benefits:
Client-side processing
Scale easily without any server-side dependencies like Microsoft Office or LibreOffice for rendering, conversion, or editing PDFs, Microsoft Office, images, videos, and HTML.
Unparalleled Rendering Quality
Bring fast rendering and leading accuracy conversion of Office documents to any web, mobile, or desktop application.
Secure By Design
No outside dependencies means you can deploy on your own infrastructure without data ever leaving your platform to eliminate vulnerabilities.
Expert and Reliable Support
Accelerate projects with our team of experienced SDK developers there to support you through your unlimited trial to the finish line and beyond.
If your users need a quick, reliable way to get PDFs into a familiar Microsoft Office format that they need to get things done, this is the solution.
In addition to conversion, our Server SDK is designed to grow with your needs. Easily add out-of-the-box components for client-side document viewing, annotating, and many other document capabilities, for 160+ file formats on any platform.
To find out more about SDK capabilities, connect with us. Or, check out our documentation to see for yourself.
Tags
office conversion
pdf conversion
python
PDF SDK
Isaac Maw
Technical Content Creator
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content