AVAILABLE NOW: Spring 2025 Release
By Garry Klooesterman | 2025 May 14
5 min
Tags
pdf redaction
Summary: Protecting sensitive information is critical in any business. Redaction is often used as an effective method of protecting this information but if it’s not done right, the consequences can be serious. This blog looks at redaction and automating the process using an SDK like the Apryse SDK and Python.
Protecting sensitive information is critical in any business and when not done right, the consequences can be serious, including fraud, reputational risk, and even personal harm.
Regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, or the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada define how personal information is to be handled, secured, and disclosed.
One way to protect sensitive information is through redaction, which has been used for centuries as an effective method of removing information that should not be shared. However, if inadequate tools or methods are used, this information can potentially still be accessed. To help ensure redaction is done right, an SDK can be used to automate the process.
In this blog, we’ll look at what redaction is, why it’s important, and how to redact information from a PDF using an SDK. In the example, we’ll use Python, but the Apryse SDK is also available for other languages and frameworks such as C#, Java, and more.
For a browser-side solution, see our Ensuring Irreversible Redaction: Avoiding Hidden Metadata Risks blog.
Removing sensitive information from a document is known as redaction. This allows for the sharing of select information while protecting other sensitive information. This could be as simple as a medical student in clinical practice removing any patient identification information from the x-rays in their presentation. Another example is a business logging all their sales receipts but removing the credit card information for their customers. Both scenarios allow for the tracking and sharing of important information while protecting other sensitive information that could be exploited.
These are just two of the many examples of how redaction can be used.
Paper Documents
Common methods of redaction for paper are using black marker or opaque tape to obscure the text before it is photocopied, which in sense, permanently removes any information that was to be protected.
Figure 1 – An example of a redacted document.
For paper documents, these methods work well, but what about digital documents?
Digital Documents
Ensuring sensitive information is redacted properly and permanently in digital documents can be more difficult. As the use of digital documents increased, the need for redaction became more apparent. Various methods have been used, some not as effective as others. For example, changing the color of the text or background or covering the text with opaque blocks in an image layer are not effective ways to redact information. This can be seen in various examples in the media where these methods were used but the text intended to be redacted was able to be retrieved.
Information to be redacted can also exist in the document in areas such as comments, version history, or metadata.
The redaction process at a basic level consists of two main parts:
It’s simple enough but it’s also important that the right tools or methods are used. Otherwise, sensitive information could be unintentionally released resulting in potentially severe consequences.
There are several benefits to automating the redaction process. For example, using an SDK like the Apryse SDK to automate redaction helps:
Note: You can even customize the redaction techniques to meet your specific needs.
Using the Apryse SDK, the redaction process consists of two steps:
Note: You can customize the style elements of the redaction overlay including color, text, font, border, transparency, and more.
Now that we’ve looked at redaction and how it works, let’s get started with setting up a project by following these instructions.
We'll now look at how to redact parts of a PDF using the code from our PDFRedacttest sample, just one of our many code samples available.
I've provided the code here as well for easy reference.
#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------
import site
site.addsitedir("../../../PDFNetC/Lib")
import sys
from PDFNetPython import *
sys.path.append("../../LicenseKey/PYTHON")
from LicenseKey import *
# PDF Redactor is a separately licensable Add-on that offers options to remove
# (not just covering or obscuring) content within a region of PDF.
# With printed pages, redaction involves blacking-out or cutting-out areas of
# the printed page. With electronic documents that use formats such as PDF,
# redaction typically involves removing sensitive content within documents for
# safe distribution to courts, patent and government institutions, the media,
# customers, vendors or any other audience with restricted access to the content.
#
# The redaction process in PDFNet consists of two steps:
#
# a) Content identification: A user applies redact annotations that specify the
# pieces or regions of content that should be removed. The content for redaction
# can be identified either interactively (e.g. using 'pdftron.PDF.PDFViewCtrl'
# as shown in PDFView sample) or programmatically (e.g. using 'pdftron.PDF.TextSearch'
# or 'pdftron.PDF.TextExtractor'). Up until the next step is performed, the user
# can see, move and redefine these annotations.
# b) Content removal: Using 'pdftron.PDF.Redactor.Redact()' the user instructs
# PDFNet to apply the redact regions, after which the content in the area specified
# by the redact annotations is removed. The redaction function includes number of
# options to control the style of the redaction overlay (including color, text,
# font, border, transparency, etc.).
#
# PDFTron Redactor makes sure that if a portion of an image, text, or vector graphics
# is contained in a redaction region, that portion of the image or path data is
# destroyed and is not simply hidden with clipping or image masks. PDFNet API can also
# be used to review and remove metadata and other content that can exist in a PDF
# document, including XML Forms Architecture (XFA) content and Extensible Metadata
# Platform (XMP) content.
def Redact(input, output, vec, app):
doc = PDFDoc(input)
if doc.InitSecurityHandler():
Redactor.Redact(doc, vec, app, False, True)
doc.Save(output, SDFDoc.e_linearized)
def main():
# Relative path to the folder containing the test files.
input_path = "../../TestFiles/"
output_path = "../../TestFiles/Output/"
PDFNet.Initialize(LicenseKey)
vec = VectorRedaction()
vec.append(Redaction(1, Rect(100, 100, 550, 600), False, "Top Secret"))
vec.append(Redaction(2, Rect(30, 30, 450, 450), True, "Negative Redaction"))
vec.append(Redaction(2, Rect(0, 0, 100, 100), False, "Positive"))
vec.append(Redaction(2, Rect(100, 100, 200, 200), False, "Positive"))
vec.append(Redaction(2, Rect(300, 300, 400, 400), False, ""))
vec.append(Redaction(2, Rect(500, 500, 600, 600), False, ""))
vec.append(Redaction(3, Rect(0, 0, 700, 20), False, ""))
app = Appearance()
app.RedactionOverlay = True
app.Border = False
app.ShowRedactedContentRegions = True
Redact(input_path + "newsletter.pdf", output_path + "redacted.pdf", vec, app)
PDFNet.Terminate()
print("Done...")
if __name__ == '__main__':
main()
Here we'll take a closer look at the code and break it down.
The areas to redact are defined in the array vec, created using the code:
vec = VectorRedaction()
Each section or region of the page to be redacted is defined by its own array element, for example:
vec.append(Redaction(2, Rect(30, 30, 450, 450), True, "Negative Redaction"))
This line tells the SDK that we’re going to redact page 2 of the document, creating a rectangular section with the coordinates “30, 30, 450, 450”. The content in that area will be left untouched with the remainder of the page being permanently deleted, and you’ll see the words “Negative Redaction.”
The following code customizes the appearance of the redaction overlay:
app = Appearance()
app.RedactionOverlay = True
app.Border = False
app.ShowRedactedContentRegions = True
And here’s the line of code that runs the redaction based on the parameters that have been previously set:
Redact(input_path + "newsletter.pdf", output_path + "redacted.pdf", vec, app)
Congrats! You’ve redacted your first document!
As we’ve just seen, redaction is straight forward and easy when using an SDK like the Apryse SDK. Automating the process helps ensure sensitive information is properly protected and helps businesses adhere to international privacy laws. Using the wrong or inadequate methods to redact information can have serious consequences including reputational damage, financial loss, and even personal harm.
Try it out for yourself with our free trial.
Get started now or contact our sales team for any questions. You can also check out our Discord community for support and discussions.
Tags
pdf redaction
Garry Klooesterman
Senior Technical Content Creator
Share this post
PRODUCTS
Platform Integrations
End User Applications
Popular Content