AVAILABLE NOW: Spring 2025 Release

How to Redact a PDF in .NET

By Garry Klooesterman | 2025 May 08

Sanity Image
Read time

4 min

Summary: Redaction is a key part of protecting sensitive information and when done incorrectly or with inadequate methods, serious repercussions can happen such as damage to reputation, financial loss, and even compromising people's safety. This blog looks at redaction and using an SDK like the Apryse SDK and .NET to automate the process along with the associated benefits.

Introduction

Copied to clipboard

We probably all know this movie scene… It’s a spy movie; the main character has finally reached the enemy’s headquarters and found the villain’s office. They’re rummaging through a drawer of top-secret files and just when they finally find the page they’re looking for, they see that all the important parts are covered up by black marker. Welcome to redaction.

In this blog, we’ll look at what redaction is, why it’s important, and how to redact information from a PDF using an SDK. In the example, we’ll use .NET and C#, but the Apryse SDK is also available for other languages and frameworks such as Python, Java, and more.

For a browser-side solution, see our Ensuring Irreversible Redaction: Avoiding Hidden Metadata Risks blog.

PDF Redaction and its Use Cases

Copied to clipboard

Redaction is defined as the process of removing sensitive information from a document to allow the selective disclosure of information. For example, a law firm could redact a witness’ personal information to protect their identity while keeping their statement accessible. In healthcare, a hospital might share patient medical records with insurance companies for billing. The sensitive information in these records such as the patient’s name, address, medical history, and other information could be redacted before the records are shared.

These are just two of the many examples of how redaction can be used. Redaction is about protecting sensitive information and is often driven by regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, or the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada.

Redaction Methods

Copied to clipboard

Paper Documents

On paper, redaction comes in various forms such as black marker obscuring the text or using opaque tape to cover up sections before the document is photocopied.

Blog image

Figure 1 – An extreme example of a redacted document.

These methods work pretty well for paper documents but what about digital documents?

Digital Documents

With digital documents, it’s a bit more difficult to ensure that the sensitive information is redacted properly and permanently. It really comes down to how it’s done and unfortunately, mistakes have been made in the past which have caused some cases to hit the media. Simple redaction methods like changing the color of the text or background or covering the text with opaque blocks in an image layer were used and thought to be sufficient. However, the text intended to be redacted was retrieved, showing that these methods are not suitable.

Other elements must also be considered, such as the complexity of the document and where information may be “hiding” like in the metadata.

The Redaction Process

Copied to clipboard

The redaction process is straightforward, consisting of two main parts:

  • Identify the content to be redacted.
  • Remove the content.

This seems simple but as we’ve seen, not using the right tools or methods can cause significant issues such as allowing the unintentional release of sensitive information like patient data, financial information, company trade secrets, and more.

Apryse SDK for PDF Redaction

Copied to clipboard

Using an SDK like the Apryse SDK to automate the process provides several benefits such as:

  • Reducing human error by ensuring all elements of the PDF are reviewed for redaction including metadata, XML Forms Architecture (XFA) content, and Extensible Metadata Platform (XMP) content.
  • Deleting text, images, or entire pages, ensuring that the content is unrecoverable.
  • Handling large volumes of documents quickly and consistently with bulk processing.
  • Automatically identifying and redacting specific types of information based on pattern and keyword detection.
  • Customizing the redaction techniques to meet your specific needs.

The redaction process using the Apryse SDK consists of two steps:

  • Content Identification: A user marks the content to be redacted using redact annotations.
  • Content Removal: The SDK applies the redactions and the information is removed from the identified regions.

Note: You can customize the style elements of the redaction overlay including color, text, font, border, transparency, and more.

How to Get Started

Copied to clipboard

Now that we’ve looked at redaction and how it works, let’s get started with setting up a project by following these instructions.

  1. Download the Apryse Server SDK. In this case, we’ll want to choose the C# .NET PDF library.
  2. Follow these steps to extract the folder from the .zip file and set up .NET.
  3. Get a free trial key.

Now that we’ve got everything set up, let’s look at how to redact parts of a PDF. We're going to use the code from our PDFRedacttest sample, just one of our many code samples available.

I've provided the code here as well for easy reference.

//---------------------------------------------------------------------------------------
// Copyright (c) 2001-2024 by Apryse Software Inc. All Rights Reserved.
// Consult legal.txt regarding legal and license information.
//---------------------------------------------------------------------------------------

using System;
using System.IO;
using System.Collections;

using pdftron;
using pdftron.Common;
using pdftron.Filters;
using pdftron.SDF;
using pdftron.PDF;

namespace PDFNetSamples
{
	// PDF Redactor is a separately licensable Add-on that offers options to remove 
	// (not just covering or obscuring) content within a region of PDF. 
	// With printed pages, redaction involves blacking-out or cutting-out areas of 
	// the printed page. With electronic documents that use formats such as PDF, 
	// redaction typically involves removing sensitive content within documents for 
	// safe distribution to courts, patent and government institutions, the media, 
	// customers, vendors or any other audience with restricted access to the content. 
	//
	// The redaction process in PDFNet consists of two steps:
	// 
	//  a) Content identification: A user applies redact annotations that specify the 
	// pieces or regions of content that should be removed. The content for redaction 
	// can be identified either interactively (e.g. using 'pdftron.PDF.PDFViewCtrl' 
	// as shown in PDFView sample) or programmatically (e.g. using 'pdftron.PDF.TextSearch'
	// or 'pdftron.PDF.TextExtractor'). Up until the next step is performed, the user 
	// can see, move and redefine these annotations.
	//  b) Content removal: Using 'pdftron.PDF.Redactor.Redact()' the user instructs 
	// PDFNet to apply the redact regions, after which the content in the area specified 
	// by the redact annotations is removed. The redaction function includes number of 
	// options to control the style of the redaction overlay (including color, text, 
	// font, border, transparency, etc.).
	// 
	// PDFTron Redactor makes sure that if a portion of an image, text, or vector graphics 
	// is contained in a redaction region, that portion of the image or path data is 
	// destroyed and is not simply hidden with clipping or image masks. PDFNet API can also 
	// be used to review and remove metadata and other content that can exist in a PDF 
	// document, including XML Forms Architecture (XFA) content and Extensible Metadata 
	// Platform (XMP) content.
	class Class1
	{
		private static pdftron.PDFNetLoader pdfNetLoader = pdftron.PDFNetLoader.Instance();
		static Class1() {}
		
		static void Redact(string input, string output, ArrayList rarr, Redactor.Appearance app)
		{
			using (PDFDoc doc = new PDFDoc(input))
			{
				doc.InitSecurityHandler();
				Redactor.Redact(doc, rarr, app, false, true);
				doc.Save(output, SDFDoc.SaveOptions.e_linearized);
			}
		}

		/// <summary>
		/// The following sample illustrates how to redact a PDF document using 'pdftron.PDF.Redactor'.
		/// </summary>
		static void Main(string[] args)
		{
			PDFNet.Initialize(PDFTronLicense.Key);

			string input_path = "../../../../TestFiles/";
			string output_path = "../../../../TestFiles/Output/";
			try
			{
				ArrayList rarr = new ArrayList();
				rarr.Add(new Redactor.Redaction(1, new Rect(100, 100, 550, 600), false, "Top Secret"));
				rarr.Add(new Redactor.Redaction(2, new Rect(30, 30, 450, 450), true, "Negative Redaction"));
				rarr.Add(new Redactor.Redaction(2, new Rect(0, 0, 100, 100), false, "Positive"));
				rarr.Add(new Redactor.Redaction(2, new Rect(100, 100, 200, 200), false, "Positive"));
				rarr.Add(new Redactor.Redaction(2, new Rect(300, 300, 400, 400), false, ""));
				rarr.Add(new Redactor.Redaction(2, new Rect(500, 500, 600, 600), false, ""));
				rarr.Add(new Redactor.Redaction(3, new Rect(0, 0, 700, 20), false, ""));

				Redactor.Appearance app = new Redactor.Appearance();
				app.RedactionOverlay = true;
				app.Border = false;
				app.ShowRedactedContentRegions = true;

				Redact(input_path + "newsletter.pdf", output_path + "redacted.pdf", rarr, app);

				Console.WriteLine("Done...");
			}
			catch (PDFNetException e)
			{
				Console.WriteLine(e.Message);
			}
			PDFNet.Terminate();
		}
	}
}

Let's take a closer look at the parts of the code above.

The areas to redact are defined in the array rarr, created using the code:

ArrayList rarr = new ArrayList();

Each section or region of the page to be redacted is specified by a separate array element like in the following code:

rarr.Add(new Redactor.Redaction(1, new Rect(100, 100, 550, 600), false, "Top Secret"));

This line tells the SDK that we’re going to redact a rectangular section of the page with the coordinates “100, 100, 550, 600”. The content in that area will be permanently deleted and will be replaced with the words “Top Secret”.

You can also see the elements to customize the appearance of the redaction overlay here:

Redactor.Appearance app = new Redactor.Appearance();
app.RedactionOverlay = true;
app.Border = false;
app.ShowRedactedContentRegions = true;

And last but certainly not least, this is the code that performs the redaction based on the parameters that have been previously set:

Redact(input_path + "newsletter.pdf", output_path + "redacted.pdf", rarr, app);

And there you have it. You’ve redacted your first document!

Conclusion

Copied to clipboard

We’ve just looked at redaction and using the Apryse SDK to automate the process. Redaction plays a key part in protecting sensitive information and is important for many reasons from minimizing reputation risk to ensuring personal safety. The Apryse SDK enables you to securely and permanently redact information from PDFs, helping you protect sensitive information and adhere to international privacy laws.

Try it out for yourself with our free trial.

Get started now or contact our sales team for any questions. You can also check out our Discord community for support and discussions.

Sanity Image

Garry Klooesterman

Senior Technical Content Creator

Share this post

email
linkedIn
twitter