AVAILABLE NOW: Spring 2026 Release

Home

All Blogs

Apryse Adds a PDF Sanitization Capability to the Server SDK

Published April 15, 2026

Updated April 15, 2026

Read time

5 min

email
linkedIn
twitter
link

Apryse Adds a PDF Sanitization Capability to the Server SDK

Sanity Image

Laura Massingham

Director of Product Marketing

If you work in government, legal, healthcare, or any enterprise environment where documents move across organizational boundaries, you already know the problem. PDFs hold more than the content you see on the page. They include metadata, embedded scripts, hidden layers, form data, and markup annotations, and every one of those elements is a potential compliance risk, especially for highly regulated industries.

Sanity Image

At the request of our customers, we are launching a new feature that makes PDF sanitization a repeatable, auditable part of any document workflow. Sanitization requires a deep knowledge of the PDF spec, careful handling of edge cases, and a verifiable record of what was changed. The Sanitization feature handles all of that natively, so your team can focus on the workflow, not the implementation.

How Does it Work?

Copied to clipboard

The Apryse PDF Sanitization feature gives you a clean, purpose-built interface to detect and remove sensitive or malicious content from any PDF with two simple calls.

  • GetSanitizableContent() scans a document and returns a structured report of everything it finds: metadata, embedded JavaScript, hidden text, markup annotations, overlapping content, and more. You decide what gets flagged.
  • SanitizeDocument() takes that report and executes the removal. The output is a clean PDF, stripped of whatever you specified and ready for external sharing or archival.

The workflow is intentionally a two-stage process allowing developers to inspect first and then sanitize second, although a user can skip the first step and move to SanitizeDocument() if they choose.  That separation allows for a review process before any content is removed, helping security teams align to compliance frameworks.

What Gets Removed?

Copied to clipboard

The Sanitization tool helps derisk any PDF document for the following hidden elements:

  • Metadata and document properties: author names, revision history, creation timestamps, software fingerprints
  • Embedded JavaScript: the primary vector for PDF-based malware and script injection attacks
  • Hidden text and layers: content invisible to the reader but fully accessible to downstream processing
  • Markup annotations and comments: redline notes, sticky comments, and review markup that often carry internal context
  • Embedded files and attachments: secondary documents nested inside the PDF container
  • Form data: field values and submission data left behind after form processing

Who is this Tool for?

Copied to clipboard

Any organization can benefit from ensuring documents that leave their organization are fully vetted for these types of hidden data. Any security teams ingesting PDFs from external sources need a reliable way to neutralize embedded scripts before those files touch internal systems.

Of course there is more at stake for regulated industries including:

  • Government agencies need to strip metadata and scripts before submitting documents through official channels.
  • Legal and healthcare teams deal with documents that pass through multiple hands, including opposing counsel, insurers, regulators. Hidden annotations or leftover form data can result in PHI falling into the wrong hands.

Get Started

Copied to clipboard

The PDF Sanitization feature is available now with an Apryse Server SDK license. Give it a try with our documentation and samples or reach out to sales to learn more.

Read the Full Release Notes

Find the Server SDK release notes in our documentation for details on everything new in server-side document processing.