RELEASE: What's New in Summer 2024

Creating a Document Processing App using Node.js and Express: Part 2

By Roger Dunham | 2024 Jul 18

Sanity Image
Read time

4 min

Summary: Are you familiar with using Node.js and Express but want to add the ability to work with PDFs to your app? Follow this step-by-step guide and learn how to use the Apryse SDK to view and edit PDFs and convert several different file types into PDFs.

Introduction

Copied to clipboard

This is the second part of a short series that will lead you through the steps needed to set up a simple server that will convert documents from one type to another using the Apryse SDK.

The Apryse SDK is a powerful library that supports adding annotations to documents, redaction and many other functions. Using it allows you to own the full document and data lifecycle without worrying about third-party server dependencies.

As we progress through this series, we will just get a taste of its functionality as we learn about:

  1. Creating a simple NodeJS server using Express
  2. Converting documents into PDF using Apryse SDK
  3. Adding further functionality to the server

In part one of the series, we created a simple Node.js/Express server that listed the files in a specific folder and allowed us to download those files by specifying their name in the URL.

In this article, we will look at how we can use Apryse SDK to convert various types of files into PDFs and then return the generated file to the browser.

How to get started with the Apryse SDK in a Node.js and Express Server

Copied to clipboard

Adding the Apryse SDK to the project is easy, but you will need to get a trial license in order to use it, so head over to Apryse and get one now if you haven’t already done so.

Step 1: Open the existing project in VS Code

Copied to clipboard

If you just finished the previous article, then you will have the project ready for the start of this one. If not, then you can download the project from GitHub then open it in the IDE of your choice.

Blog image

Figure 1 - The project as it was at the end of part one of this series.

Step 2: Use npm to install the Apryse SDK

Copied to clipboard

Next, we need to bring the Apryse SDK into our project. The easiest way to do this is via npm.

Within a terminal window, type:

npm install @pdftron/pdfnet-node --save  

Step 3: Add a new endpoint to convert a document to PDF

Copied to clipboard

We will create a new endpoint for converting files, that allows us to specify the name of the path to be converted. To do this, add the following code to the project.

const { PDFNet } = require('@pdftron/pdfnet-node'); 
app.get('/convert/:filename', (req, res) => { 
  const filename = req.params.filename; 
  const ext = path.parse(filename).ext; 
 
  const inputPath = path.resolve(__dirname, filesPath, filename); 
  const outputPath = path.resolve(__dirname, filesPath, `${filename}.pdf`); 
 
  if (ext === '.pdf') { 
    res.statusCode = 500; 
    res.end(`File is already PDF.`); 
        return; 
  } 
 
  const main = async () => { 
    const pdfdoc = await PDFNet.PDFDoc.create(); 
    await pdfdoc.initSecurityHandler(); 
    await PDFNet.Convert.toPdf(pdfdoc, inputPath); 
    await pdfdoc.save( 
      outputPath, 
      PDFNet.SDFDoc.SaveOptions.e_linearized, 
    ); 
  }; 
 
  PDFNet.runWithCleanup(main, "[Your license key]").then(() => { 
    PDFNet.shutdown(); 
    fs.readFile(outputPath, (err, data) => { 
      if (err) { 
        res.statusCode = 500; 
        res.end(err); 
      } else { 
        res.setHeader('Content Type', 'application/pdf'), 
          res.end(data); 
      } 
    }) 
  }).catch(err => { 
    res.statusCode = 500; 
    console.log(err) 
    res.send({ err }); 
  }); 
 
}); 

In this example, we are explicitly initializing and shutting down PDFNet in the function. That is not an efficient mechanism, and we will look at how to refactor that code later in this series, but for now, this is a pragmatic solution. 

This will get the name of the file as a parameter, in just the same way as was done when specifying a file to download in part one of this series.

When using the Apryse SDK, you can specify the license key in various places, but one of the easiest is within 'PDFNet.runWithCleanup' because it simplifies memory management.

Note that if you forget to specify a license key, then an error will occur informing you that ‘License key is required for function 'PDFNet.runWithCleanup'. Just update your code, and you should be good to carry on.

As a first example, we will use Apryse to convert a Word document into a PDF, so copy a suitable document into the files folder of the project (which we set up in the previous article). I used “the_rime_of_the_ancient_mariner.docx” which is included with the Apryse SDK for Python samples, but you can use any file that you want.

Head back to the browser and enter:

http://localhost:4000/convert/the_rime_of_the_ancient_mariner.docx (or whatever file you have copied into the files folder).

The server will get the filename from the URL, just as we saw in Part One of this series. Look up the file, then convert it into a PDF and send that back to the browser.

Blog image

Figure 2 - The result of on-the-fly conversion of a DOCX file into a PDF. Note that although the URL still shows a .docx file extension, the file that is being displayed, and could be downloaded is actually a PDF. In a real app this would be a little confusing, but for now we will leave it since it shows what is being sent to the server.

That was easy.

Let’s look at what happened. The actual conversion of DOCS into PDF was performed by the function await PDFNet.Convert.toPdf(pdfdoc, inputPath);

We didn’t need to specify that the file to be converted was a Word document - it just worked. Even better though, that exact same function works for other file types too.

For example, I can convert an image file to a PDF (assuming that it is in the files folder) using:

Blog image

Figure 3 - The Convert.toPdf function works with many types of files.

Convert.toPdf is immensely powerful – for very many file types it will just work as it is (and on Windows if it can’t convert a file using the Apryse SDK then it will even try to use either MS Office interop or a virtual printer as a fallback mechanism).

Before we move on though, let’s look at what happens if we try to convert a .dwg file into PDF. Dwg files are a type of CAD document and converting them is complex and not readily available.

In order to keep the size of the Apryse SDK small for the majority of use-cases, some less commonly used functionality is available in external modules that can be called from the SDK.

One of these modules (available for Windows and Linux) is the CAD Module that will convert from CAD to PDF, so we will install that and use it to convert the file Stadsslot Berlijn 2014.dwg which is available within the Python samples that we used earlier.

Installing an External Module

Copied to clipboard

Download the CAD module file and extract the .zip (or .gz) file.

In addition to Samples and other files, there is a folder called lib.

Blog image

Figure 4 - The contents of the CAD module archive.

Copy the lib folder as it is into the Node.js project folder and inform the Apryse SDK where the module is located by included the path by using PDFNet.addResourcePath. That function needs to located in the code before the module is used.

Blog image

Figure 5 - Typical location for the lib folder that contains external modules. In this case the CAD module for Windows is being used.

await PDFNet.addResourceSearchPath('./lib/'); 

While the code could be refactored, for now we won't do that, so the main method that we created in step 3, should now be modified to be

// Other code in the '/convert/:filename' handler .... 
const main = async () => { 
    await PDFNet.addResourceSearchPath('./lib/'); 
    const pdfdoc = await PDFNet.PDFDoc.create(); 
    await pdfdoc.initSecurityHandler(); 
    await PDFNet.Convert.toPdf(pdfdoc, inputPath); 
    pdfdoc.save( 
      outputPath, 
      PDFNet.SDFDoc.SaveOptions.e_linearized, 
    ); 
    ext = '.pdf'; 
  }; 
// Other code ..... 

Now, enter the name of the file to be converted in the browser. The Convert.toPdf function will identify the file as a CAD drawing and call the module that we have just installed to automatically convert the file into a PDF format and return that to the browser.

Blog image

Figure 6 - If the CAD Module is available, then converting from a DWG format to PDF is a cinch! Don’t worry about spaces in the name, they will be handled automatically.

And that’s a great example of how with well written code and the Apryse SDK you can quickly leverage functionality. The same code can convert Word, Excel and PowerPoint documents, images, html, CAD drawings and many other formats into PDFs.

Convert.toPdf method will do the heavy lifting of file conversion for you, allowing you to spend your time to the parts of your application that are specific to your business requirements.

Wrap Up

Copied to clipboard

That it. We’ve added the Apryse SDK to our code, and using just one function added to our code, we can now perform on the fly conversion of dozens of document types into PDF.

In the next article in this series, we will look at a selection of the other functionality that is available within the SDK. We will see how we can convert from PDF into Office format, how we can populate templates with data to generate new documents, and how we can create brand new empty PDFs as a starting point for adding all kinds of content to them.

If you don't want to wait until we release the last part of this series, check out the Apryse SDK documentation and start using it today.

Sanity Image

Roger Dunham

Share this post

email
linkedIn
twitter