RELEASE: What's New in Summer 2024

VIDEO: Headers and Footers When Converting to DOCX

By Roger Dunham | 2024 Sep 12

Sanity Image
Read time

2 min

Summary: The Apryse SDK now lets you control how headers and footers are handled in PDF to Word conversions. You can extract them as headers and footers, remove them, or include them as regular text. A video and detailed article with code samples are available for guidance.

Introduction

Copied to clipboard

The Apryse SDK is available for Windows, macOS, and Linux in a wide range of programming languages. It is a superb library for viewing, annotating, creating, and editing PDFs and other document types.

While much of the functionality is included directly within the SDK, specialized document processing, such as working with CAD, OCR, Data Extraction and Conversion of PDF into DOCX, PPTX, XLSX, and HTML, requires the use of add-on modules.

The video tutorial above explains in detail how to specify the way that headers and footers should be treated when converting from PDF to Office. If you prefer learning by reading, we provide a brief introduction on the topic below or if you're looking for a more comprehensive guide, you can check out our detailed Header and Footer Extraction on the Server Side blog.

Converting from PDF to Office

Copied to clipboard

One of the easiest ways to get started with converting from PDF to Office is via either Xodo.com or Xodo PDF Studio - both use the Structured Output module behind the scenes to deliver World class document conversion. I strongly recommend that you try them out.

One great aspect of DOCX files is the support for headers and footers. At its simplest, a header (or footer) is text that is repeated at the top (or bottom) of each page. In reality, it can be much more complex, for example, with different text on odd and even pages.

Blog image

Figure 1 - typical headers and footers in a Word document - note how they are different on odd and even pages.

PDFs can also contain headers and footers – hardly surprising since many PDFs were created from DOCX files. While it is often easy, as a human, to detect what is a header or footer – doing so automatically is challenging.

Until now Structured Output gave you no control over the way that headers and footers were handled – if they were detected as being present in the PDF then they would always be included as headers and footers in the Word document. Very often that was exactly what was wanted (which is great for single click “it just works” applications), but just occasionally you might want to do things differently.

The Apryse SDK now allows you to specify how headers and footers should be handled, whether that is:

  • Extracted as headers and footers (the default behavior)
  • Entirely removed
  • Just included as regular text within the page

Conclusion

Copied to clipboard

There is a ton of documentation, not just for converting from PDF to Office, but for all the other functionality that is supported.

Try it out! See how much time it could save your company when you need to convert a PDF into DOCX format. It’s easy to get started with the Apryse SDK, and if you run into any issues then reach out to us on Discord .

Sanity Image

Roger Dunham

Share this post

email
linkedIn
twitter