AVAILABLE NOW: Spring 2025 Release
By Garry Klooesterman | 2025 May 08
4 min
Tags
pdf redaction
Summary: Redaction is a key part of protecting sensitive information and when done incorrectly or with inadequate methods, serious repercussions can happen such as damage to reputation, financial loss, and even compromising people's safety. This blog looks at redaction and using an SDK like the Apryse SDK and .NET to automate the process along with the associated benefits.
We probably all know this movie scene… It’s a spy movie; the main character has finally reached the enemy’s headquarters and found the villain’s office. They’re rummaging through a drawer of top-secret files and just when they finally find the page they’re looking for, they see that all the important parts are covered up by black marker. Welcome to redaction.
In this blog, we’ll look at what redaction is, why it’s important, and how to redact information from a PDF using an SDK. In the example, we’ll use .NET and C#, but the Apryse SDK is also available for other languages and frameworks such as Python, Java, and more.
For a browser-side solution, see our Ensuring Irreversible Redaction: Avoiding Hidden Metadata Risks blog.
Redaction is defined as the process of removing sensitive information from a document to allow the selective disclosure of information. For example, a law firm could redact a witness’ personal information to protect their identity while keeping their statement accessible. In healthcare, a hospital might share patient medical records with insurance companies for billing. The sensitive information in these records such as the patient’s name, address, medical history, and other information could be redacted before the records are shared.
These are just two of the many examples of how redaction can be used. Redaction is about protecting sensitive information and is often driven by regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, or the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada.
Paper Documents
On paper, redaction comes in various forms such as black marker obscuring the text or using opaque tape to cover up sections before the document is photocopied.
Figure 1 – An extreme example of a redacted document.
These methods work pretty well for paper documents but what about digital documents?
Digital Documents
With digital documents, it’s a bit more difficult to ensure that the sensitive information is redacted properly and permanently. It really comes down to how it’s done and unfortunately, mistakes have been made in the past which have caused some cases to hit the media. Simple redaction methods like changing the color of the text or background or covering the text with opaque blocks in an image layer were used and thought to be sufficient. However, the text intended to be redacted was retrieved, showing that these methods are not suitable.
Other elements must also be considered, such as the complexity of the document and where information may be “hiding” like in the metadata.
The redaction process is straightforward, consisting of two main parts:
This seems simple but as we’ve seen, not using the right tools or methods can cause significant issues such as allowing the unintentional release of sensitive information like patient data, financial information, company trade secrets, and more.
Using an SDK like the Apryse SDK to automate the process provides several benefits such as:
The redaction process using the Apryse SDK consists of two steps:
Note: You can customize the style elements of the redaction overlay including color, text, font, border, transparency, and more.
Now that we’ve looked at redaction and how it works, let’s get started with setting up a project by following these instructions.
Now that we’ve got everything set up, let’s look at how to redact parts of a PDF. We're going to use the code from our PDFRedacttest sample, just one of our many code samples available.
I've provided the code here as well for easy reference.
//---------------------------------------------------------------------------------------
// Copyright (c) 2001-2024 by Apryse Software Inc. All Rights Reserved.
// Consult legal.txt regarding legal and license information.
//---------------------------------------------------------------------------------------
using System;
using System.IO;
using System.Collections;
using pdftron;
using pdftron.Common;
using pdftron.Filters;
using pdftron.SDF;
using pdftron.PDF;
namespace PDFNetSamples
{
// PDF Redactor is a separately licensable Add-on that offers options to remove
// (not just covering or obscuring) content within a region of PDF.
// With printed pages, redaction involves blacking-out or cutting-out areas of
// the printed page. With electronic documents that use formats such as PDF,
// redaction typically involves removing sensitive content within documents for
// safe distribution to courts, patent and government institutions, the media,
// customers, vendors or any other audience with restricted access to the content.
//
// The redaction process in PDFNet consists of two steps:
//
// a) Content identification: A user applies redact annotations that specify the
// pieces or regions of content that should be removed. The content for redaction
// can be identified either interactively (e.g. using 'pdftron.PDF.PDFViewCtrl'
// as shown in PDFView sample) or programmatically (e.g. using 'pdftron.PDF.TextSearch'
// or 'pdftron.PDF.TextExtractor'). Up until the next step is performed, the user
// can see, move and redefine these annotations.
// b) Content removal: Using 'pdftron.PDF.Redactor.Redact()' the user instructs
// PDFNet to apply the redact regions, after which the content in the area specified
// by the redact annotations is removed. The redaction function includes number of
// options to control the style of the redaction overlay (including color, text,
// font, border, transparency, etc.).
//
// PDFTron Redactor makes sure that if a portion of an image, text, or vector graphics
// is contained in a redaction region, that portion of the image or path data is
// destroyed and is not simply hidden with clipping or image masks. PDFNet API can also
// be used to review and remove metadata and other content that can exist in a PDF
// document, including XML Forms Architecture (XFA) content and Extensible Metadata
// Platform (XMP) content.
class Class1
{
private static pdftron.PDFNetLoader pdfNetLoader = pdftron.PDFNetLoader.Instance();
static Class1() {}
static void Redact(string input, string output, ArrayList rarr, Redactor.Appearance app)
{
using (PDFDoc doc = new PDFDoc(input))
{
doc.InitSecurityHandler();
Redactor.Redact(doc, rarr, app, false, true);
doc.Save(output, SDFDoc.SaveOptions.e_linearized);
}
}
/// <summary>
/// The following sample illustrates how to redact a PDF document using 'pdftron.PDF.Redactor'.
/// </summary>
static void Main(string[] args)
{
PDFNet.Initialize(PDFTronLicense.Key);
string input_path = "../../../../TestFiles/";
string output_path = "../../../../TestFiles/Output/";
try
{
ArrayList rarr = new ArrayList();
rarr.Add(new Redactor.Redaction(1, new Rect(100, 100, 550, 600), false, "Top Secret"));
rarr.Add(new Redactor.Redaction(2, new Rect(30, 30, 450, 450), true, "Negative Redaction"));
rarr.Add(new Redactor.Redaction(2, new Rect(0, 0, 100, 100), false, "Positive"));
rarr.Add(new Redactor.Redaction(2, new Rect(100, 100, 200, 200), false, "Positive"));
rarr.Add(new Redactor.Redaction(2, new Rect(300, 300, 400, 400), false, ""));
rarr.Add(new Redactor.Redaction(2, new Rect(500, 500, 600, 600), false, ""));
rarr.Add(new Redactor.Redaction(3, new Rect(0, 0, 700, 20), false, ""));
Redactor.Appearance app = new Redactor.Appearance();
app.RedactionOverlay = true;
app.Border = false;
app.ShowRedactedContentRegions = true;
Redact(input_path + "newsletter.pdf", output_path + "redacted.pdf", rarr, app);
Console.WriteLine("Done...");
}
catch (PDFNetException e)
{
Console.WriteLine(e.Message);
}
PDFNet.Terminate();
}
}
}
Let's take a closer look at the parts of the code above.
The areas to redact are defined in the array rarr, created using the code:
ArrayList rarr = new ArrayList();
Each section or region of the page to be redacted is specified by a separate array element like in the following code:
rarr.Add(new Redactor.Redaction(1, new Rect(100, 100, 550, 600), false, "Top Secret"));
This line tells the SDK that we’re going to redact a rectangular section of the page with the coordinates “100, 100, 550, 600”. The content in that area will be permanently deleted and will be replaced with the words “Top Secret”.
You can also see the elements to customize the appearance of the redaction overlay here:
Redactor.Appearance app = new Redactor.Appearance();
app.RedactionOverlay = true;
app.Border = false;
app.ShowRedactedContentRegions = true;
And last but certainly not least, this is the code that performs the redaction based on the parameters that have been previously set:
Redact(input_path + "newsletter.pdf", output_path + "redacted.pdf", rarr, app);
And there you have it. You’ve redacted your first document!
We’ve just looked at redaction and using the Apryse SDK to automate the process. Redaction plays a key part in protecting sensitive information and is important for many reasons from minimizing reputation risk to ensuring personal safety. The Apryse SDK enables you to securely and permanently redact information from PDFs, helping you protect sensitive information and adhere to international privacy laws.
Try it out for yourself with our free trial.
Get started now or contact our sales team for any questions. You can also check out our Discord community for support and discussions.
Tags
pdf redaction
Garry Klooesterman
Senior Technical Content Creator
Share this post
PRODUCTS
Platform Integrations
End User Applications
Popular Content