RELEASE: What's New in Summer 2024
By Roger Dunham | 2024 Sep 12
2 min
Tags
pdf conversion
docx
Summary: The Apryse SDK now lets you control how headers and footers are handled in PDF to Word conversions. You can extract them as headers and footers, remove them, or include them as regular text. A video and detailed article with code samples are available for guidance.
The Apryse SDK is available for Windows, macOS, and Linux in a wide range of programming languages. It is a superb library for viewing, annotating, creating, and editing PDFs and other document types.
While much of the functionality is included directly within the SDK, specialized document processing, such as working with CAD, OCR, Data Extraction and Conversion of PDF into DOCX, PPTX, XLSX, and HTML, requires the use of add-on modules.
The video tutorial above explains in detail how to specify the way that headers and footers should be treated when converting from PDF to Office. If you prefer learning by reading, we provide a brief introduction on the topic below or if you're looking for a more comprehensive guide, you can check out our detailed Header and Footer Extraction on the Server Side blog.
One of the easiest ways to get started with converting from PDF to Office is via either Xodo.com or Xodo PDF Studio - both use the Structured Output module behind the scenes to deliver World class document conversion. I strongly recommend that you try them out.
One great aspect of DOCX files is the support for headers and footers. At its simplest, a header (or footer) is text that is repeated at the top (or bottom) of each page. In reality, it can be much more complex, for example, with different text on odd and even pages.
Figure 1 - typical headers and footers in a Word document - note how they are different on odd and even pages.
PDFs can also contain headers and footers – hardly surprising since many PDFs were created from DOCX files. While it is often easy, as a human, to detect what is a header or footer – doing so automatically is challenging.
Until now Structured Output gave you no control over the way that headers and footers were handled – if they were detected as being present in the PDF then they would always be included as headers and footers in the Word document. Very often that was exactly what was wanted (which is great for single click “it just works” applications), but just occasionally you might want to do things differently.
The Apryse SDK now allows you to specify how headers and footers should be handled, whether that is:
There is a ton of documentation, not just for converting from PDF to Office, but for all the other functionality that is supported.
Try it out! See how much time it could save your company when you need to convert a PDF into DOCX format. It’s easy to get started with the Apryse SDK, and if you run into any issues then reach out to us on Discord .
Tags
pdf conversion
docx
Roger Dunham
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content