COMING SOON: Spring 2025 Release
By Garry Klooesterman | 2025 Mar 27
4 min
Tags
pdf to office
Summary: PDF are commonly used worldwide as they are convenient and reliable. However, editing PDFs beyond the basics is a challenge many businesses face. This blog discusses using conversion SDKs in Java to automatically convert PDFs into formats such as DOCX, XLSX, or PPTX to allow for more editing and processing options.
PDFs are everywhere! With trillions of them out in the world, it would be hard to not come across one from time to time. They are an excellent format for sharing information as they display the same regardless of the hardware and software used to view them.
If you need to edit a PDF, you can use an editing tool such as Apryse WebViewer to change many of the elements. However, for more substantial changes involving elements such as formatting, table structure, or images, you’ll need a more robust document editor like MS Office and you’ll have to convert the PDF to another file format, such as DOCX, PPTX, or XLSX.
This is where a PDF conversion SDK such as the PDF to Office Conversion SDK from Apryse, saves the day. This blog will look at converting PDFs to Office format using Java, but the code is also provided for other programming languages such as C#, Python, and more.
Using an office conversion SDK provides many benefits such as:
Accuracy: Automating the conversion process using a conversion SDK preserves the format and layout of the original PDF including elements such as fonts, columns, tables, headers/footers, and more.
Efficiency and Scalability: Using a conversion SDK also allows for faster and more efficient conversion of PDFs with the ability to scale easily to match the needs of your business.
Security: Integrating a conversion SDK into your existing system eliminates the use of third-party solutions while keeping your data secure as it never leaves your platform.
We’ll use the Structured Output Module (available for Windows, Linux, and Mac) to automatically convert a PDF to MS Office. In this example, we’ll convert to DOCX.
For more details on converting PDFs to other formats, see our office conversion documentation. You can also check out the full code sample with examples for converting PDFs to PPTX and XLSX.
Note: We will also need a license key, which is free to get a trial one.
1. Download the Structure Output Module.
2. Extract the module to a folder called lib in the same folder as your project.
Figure 1: Extract the module to the lib folder.
3. Use the following code to convert a PDF to DOCX.
//
// Copyright (c) 2001-2024 by Apryse Software Inc. All Rights Reserved.
// Consult legal.txt regarding legal and license information.
//---------------------------------------------------------------------------------------
import com.pdftron.common.PDFNetException;
import com.pdftron.pdf.*;
//---------------------------------------------------------------------------------------
// The following sample illustrates how to use the PDF::Convert utility class to convert
// documents and files to Office.
//
// The Structured Output module is an optional PDFNet Add-on that can be used to convert PDF
// and other documents into Word, Excel, PowerPoint and HTML format.
//
// The Apryse SDK Structured Output module can be downloaded from
// https://docs.apryse.com/core/info/modules/
//
// Please contact us if you have any questions.
//---------------------------------------------------------------------------------------
public class PDF2OfficeTest
{
// Relative path to the folder containing test files.
static String inputPath = "../../TestFiles/";
static String outputPath = "../../TestFiles/Output/";
/// <summary>
/// The main entry point for the application.
/// </summary>
public static void main(String[] args)
{
// The first step in every application using PDFNet is to initialize the
// library. The library is usually initialized only once, but calling
// Initialize() multiple times is also fine.
PDFNet.initialize(PDFTronLicense.Key());
PDFNet.addResourceSearchPath("../../../Lib/");
try {
if (!StructuredOutputModule.isModuleAvailable()) {
System.out.println();
System.out.println("Unable to run the sample: Apryse SDK Structured Output module not available.");
System.out.println("-----------------------------------------------------------------------------");
System.out.println("The Structured Output module is an optional add-on, available for download");
System.out.println("at https://docs.apryse.com/core/info/modules/. If you have already");
System.out.println("downloaded this module, ensure that the SDK is able to find the required files");
System.out.println("using the PDFNet::AddResourceSearchPath() function.");
System.out.println();
return;
}
} catch (PDFNetException e) {
System.out.println(e);
return;
} catch (Exception e) {
System.out.println(e);
return;
}
boolean err = false;
//////////////////////////////////////////////////////////////////////////
// Word
//////////////////////////////////////////////////////////////////////////
try {
// Convert PDF document to Word
System.out.println("Converting PDF to Word");
String outputFile = outputPath + "paragraphs_and_tables.docx";
Convert.toWord(inputPath + "paragraphs_and_tables.pdf", outputFile);
System.out.println("Result saved in " + outputFile);
} catch (PDFNetException e) {
System.out.println("Unable to convert PDF document to Word, error: ");
System.out.println(e);
err = true;
} catch (Exception e) {
System.out.println("Unknown Exception, error: ");
System.out.println(e);
err = true;
}
//////////////////////////////////////////////////////////////////////////
try {
// Convert PDF document to Word with options
System.out.println("Converting PDF to Word with options");
String outputFile = outputPath + "paragraphs_and_tables_first_page.docx";
Convert.WordOutputOptions wordOutputOptions = new Convert.WordOutputOptions();
// Convert only the first page
wordOutputOptions.setPages(1, 1);
Convert.toWord(inputPath + "paragraphs_and_tables.pdf", outputFile, wordOutputOptions);
System.out.println("Result saved in " + outputFile);
} catch (PDFNetException e) {
System.out.println("Unable to convert PDF document to Word, error: ");
System.out.println(e);
err = true;
} catch (Exception e) {
System.out.println("Unknown Exception, error: ");
System.out.println(e);
err = true;
}
PDFNet.terminate();
System.out.println("Done.");
}
}
There you have it. You’ve just converted a PDF to DOCX.
When faced with the challenge of editing complex elements in PDFs, businesses need to convert them to other formats for more robust editing options. As we’ve seen, using a document conversion SDK such as the Apryse PDF to Office Conversion SDK to automate the process is easy and efficient.
Get started now or contact our sales team for any questions. You can also check out our Discord community for support and discussions.
Tags
pdf to office
Garry Klooesterman
Senior Technical Content Creator
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content