AVAILABLE NOW: Spring 2025 Release
By Garry Klooesterman | 2025 May 29
5 min
Tags
pdf redaction
Summary: Handling and protecting sensitive information must be done properly, or businesses could face serious consequences. Redaction is an effective method of protecting this information, and automating the process with an SDK helps ensure that it is done correctly. This blog looks at redaction and automating the process using the Apryse SDK and JavaScript.
Handling sensitive information, whether it’s acquiring, storing, sharing, or deleting, is not to be taken lightly. More than just good practice, it’s also about regulatory compliance. Regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, or the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada define how personal and sensitive information is handled, secured, and disclosed.
If done right, redaction is an effective method of protecting sensitive information that should not be shared. Using inadequate tools or methods could result in the information still being accessible and could lead to serious consequences, including fraud, reputational risk, and even personal harm. To help ensure redaction is done right, an SDK can be used to automate the process.
In this blog, we’ll look at what redaction is, why it’s important, and how to redact information from a PDF using an SDK. In the example, we’ll use JavaScript, but the Apryse SDK is also available for other languages and frameworks such as C#, Java, and more.
For a browser-side solution, see our Ensuring Irreversible Redaction: Avoiding Hidden Metadata Risks blog.
Redaction allows for the sharing of select information while protecting other sensitive information. One example is removing client information from screenshots of a customer relationship management (CRM) system used for training. Another example is a business removing client information such as name, age, address, and contact information when sharing client feedback internally. Both examples illustrate how redaction protects sensitive information that could be exploited while allowing other important information to be shared.
Paper Documents
For paper documents, common methods of redaction include using black marker or opaque tape to obscure the text before it is photocopied to permanently remove the information.
Figure 1 – An example of a redacted document that raises a few questions.
These are effective methods for paper documents, but what about digital documents?
Digital Documents
Ensuring sensitive information is redacted properly and permanently in digital documents can be more difficult. For example, changing the color of the text or background or covering the text with opaque blocks in an image layer are not effective ways to redact information. Cases have shown that the text intended to be redacted was able to be retrieved.
It’s important to look for information to be redacted in areas such as comments, version history, and metadata as well.
In its basic form, the redaction process is made up of two main parts:
Using an SDK like the Apryse SDK to automate redaction provides many benefits such as:
Using the Apryse SDK, the redaction process consists of two steps:
Note: Style elements of the redaction overlay including color, text, font, border, transparency, and more can also be customized.
Let’s get started with setting up a project using JavaScript.
Why JavaScript?
As one of the most used programming languages in the world, JavaScript can be found in 99% of websites. It can be used for a wide variety of client-side applications, with security as one of the many advantages, but it can also be used server-side for tasks such as queries and authentication.
JavaScript is versatile, easy to learn, and easy to implement, making it a perfect choice for our project.
Process
We'll now look at how to redact parts of a PDF using the code from one of our many code samples available, PDFRedacttest.
I've provided the code here as well for easy reference.
//---------------------------------------------------------------------------------------
// Copyright (c) 2001-2024 by Apryse Software Inc. All Rights Reserved.
// Consult legal.txt regarding legal and license information.
//---------------------------------------------------------------------------------------
// PDF Redactor is a separately licensable Add-on that offers options to remove
// (not just covering or obscuring) content within a region of PDF.
// With printed pages, redaction involves blacking-out or cutting-out areas of
// the printed page. With electronic documents that use formats such as PDF,
// redaction typically involves removing sensitive content within documents for
// safe distribution to courts, patent and government institutions, the media,
// customers, vendors or any other audience with restricted access to the content.
//
// The redaction process in PDFNet consists of two steps:
//
// a) Content identification: A user applies redact annotations that specify the
// pieces or regions of content that should be removed. The content for redaction
// can be identified either interactively (e.g. using 'pdftron.PDF.PDFViewCtrl'
// as shown in PDFView sample) or programmatically (e.g. using 'pdftron.PDF.TextSearch'
// or 'pdftron.PDF.TextExtractor'). Up until the next step is performed, the user
// can see, move and redefine these annotations.
// b) Content removal: Using 'pdftron.PDF.Redactor.Redact()' the user instructs
// PDFNet to apply the redact regions, after which the content in the area specified
// by the redact annotations is removed. The redaction function includes number of
// options to control the style of the redaction overlay (including color, text,
// font, border, transparency, etc.).
//
// PDFTron Redactor makes sure that if a portion of an image, text, or vector graphics
// is contained in a redaction region, that portion of the image or path data is
// destroyed and is not simply hidden with clipping or image masks. PDFNet API can also
// be used to review and remove metadata and other content that can exist in a PDF
// document, including XML Forms Architecture (XFA) content and Extensible Metadata
// Platform (XMP) content.
const { PDFNet } = require('@pdftron/pdfnet-node');
const PDFTronLicense = require('../LicenseKey/LicenseKey');
((exports) => {
exports.runPDFRedactTest = () => {
const redact = async(input, output, vec, app) => {
const doc = await PDFNet.PDFDoc.createFromFilePath(input);
if (await doc.initSecurityHandler()) {
PDFNet.Redactor.redact(doc, vec, app, false, true);
await doc.save(output, PDFNet.SDFDoc.SaveOptions.e_linearized);
}
};
const main = async() => {
// Relative path to the folder containing test files.
const inputPath = '../TestFiles/';
try {
const redactionArray = []; // we will contain a list of redaction objects in this array
redactionArray.push(await PDFNet.Redactor.redactionCreate(1, (await PDFNet.Rect.init(100, 100, 550, 600)), false, 'Top Secret'));
redactionArray.push(await PDFNet.Redactor.redactionCreate(2, (await PDFNet.Rect.init(30, 30, 450, 450)), true, 'Negative Redaction'));
redactionArray.push(await PDFNet.Redactor.redactionCreate(2, (await PDFNet.Rect.init(0, 0, 100, 100)), false, 'Positive'));
redactionArray.push(await PDFNet.Redactor.redactionCreate(2, (await PDFNet.Rect.init(100, 100, 200, 200)), false, 'Positive'));
redactionArray.push(await PDFNet.Redactor.redactionCreate(2, (await PDFNet.Rect.init(300, 300, 400, 400)), false, ''));
redactionArray.push(await PDFNet.Redactor.redactionCreate(2, (await PDFNet.Rect.init(500, 500, 600, 600)), false, ''));
redactionArray.push(await PDFNet.Redactor.redactionCreate(3, (await PDFNet.Rect.init(0, 0, 700, 20)), false, ''));
const appear = { redaction_overlay: true, border: false, show_redacted_content_regions: true };
await redact(inputPath + 'newsletter.pdf', inputPath + 'Output/redacted.pdf', redactionArray, appear);
console.log('Done...');
} catch (err) {
console.log(err.stack);
}
};
PDFNet.runWithCleanup(main, PDFTronLicense.Key).catch(function(error){console.log('Error: ' + JSON.stringify(error));}).then(function(){return PDFNet.shutdown();});
};
exports.runPDFRedactTest();
})(exports);
// eslint-disable-next-line spaced-comment
//# sourceURL=PDFRedactTest.js
Let’s break down the code for a closer look.
The array redactionArray is used to define areas to redact and is created using the code:
const redactionArray = []; // we will contain a list of redaction objects in this array
Each section or region to be redacted is defined by an array element like this:
redactionArray.push(await PDFNet.Redactor.redactionCreate(1, (await PDFNet.Rect.init(100, 100, 550, 600)), false, 'Top Secret'));
This line of code tells the SDK that we’re going to redact a rectangular section of page 1 with the coordinates “100, 100, 550, 600”. The content in this area will be permanently deleted and you’ll see the words “Top Secret.”
The following code is used to customize the appearance of the redaction overlay:
const appear = { redaction_overlay: true, border: false, show_redacted_content_regions: true };
The following code is the main line of code that runs the redaction using the parameters that were set earlier:
await redact(inputPath + 'newsletter.pdf', inputPath + 'Output/redacted.pdf', redactionArray, appear);
And as easily as that, you’ve redacted your first document!
Properly redacting sensitive information is crucial, and using an SDK like the Apryse SDK to automate the process not only helps ensure it's done properly but also helps businesses adhere to international privacy laws. Using incorrect or inadequate redaction methods can have serious consequences including reputational damage, financial loss, and even personal harm.
Try it out for yourself with our free trial.
Get started now or contact our sales team for any questions. You can also check out our Discord community for support and discussions.
Tags
pdf redaction
Garry Klooesterman
Senior Technical Content Creator
Share this post
PRODUCTS
Platform Integrations
End User Applications
Popular Content