Unlock the Power of Direct PDF Editing with WebViewer 10.7

Generating Documents and Reports from DOCX Templates and JSON using Apryse and Angular

By Roger Dunham | 2023 Nov 16

Sanity Image
Read time

8 min

Summary

Copied to clipboard

Learn how to use Angular to develop an application for automatically creating documents and reports from DOCX templates and JSON.

Introduction

Copied to clipboard

Apryse has two distinct systems for creating documents and reports from templates. In both cases the templates are Office documents, but neither method needs Office to be installed to generate the final document from the template.

In this article we will look at the high-code SDK DocGen system which is based directly on the Apryse SDK. The alternative, low-code, system is Fluent, which can eliminate lengthy development work for your templates.

See how Fluent can be useful in healthcare document generation.

SDK Doc Gen creates PDFs directly and requires data in a JSON format it can be used entirely within the browser, running as a specific process on the server side or as an entirely standalone app running on a local machine.

In this article, we will look at the Apryse Document generation system. We will

Why Text Substitution in a PDF Sometimes Gives Poor Results

Copied to clipboard

A quick search of the internet reveals some document generation systems that substitute text directly into PDF templates. While there is nothing wrong with that way if it does everything that you need, it is likely that, when used with real data, you will soon find limitations.

Blog image

Figure 1 - A PDF created by substitution of text from a PDF template. There was not enough space, so some text is missing.

You can read more about why using PDFs as a template is problematic here.

The Apryse showcase illustrates how the SDK can be used as an Office editor reflowing text in just the same way that Word does. SDK DocGen uses this functionality as the basis of a tool for substituting text-markers(which are known at tags) in a template with real data, while simultaneously supporting complex text and paragraph formatting. Even better, it does all of this without a dependency for Office or any other Word processing to be installed.

Seeing Document Generation in Action

Copied to clipboard

Before we go any further, let’s look at a live example of document generation from a DOCX template. This JavaScript sample allows you to select any template, which will then be queried by the program for tags. The program then auto-generates a data entry form based on the tags, which in turn is then used to populate the template.

Blog image

Figure 2 - The online sample showing a dynamically generated data entry area.

Seeing a prebuilt example is one thing, but how would we do this in practice.

Overview of the Document Generation Process – From DOCX to PDF

Copied to clipboard
  1. Create a template in DOCX format. This can be done using markers (the ‘tags’ that were mentioned earlier) to indicate where text should be located. In addition to single word replacement, table creation is supported, as is the ability to have text shown, or not shown, depending on the actual data. The method for creating templates is well documented, and can be performed by anyone that is moderately skilled at Office. The work can be done within Office, or any other editor that supports the DOCX format.
  2. Generate a JSON object that contains the data that should be copied into the final document. The origin of that data is up to you - whether it is collected from a file, user input (which is the case in the example just mentioned), a database or even a RESTful API.
  3. Use the Apryse SDK to replace the tags in the template with the data in order to create the final document.

In this article, the code is based on the Angular sample. A few changes have been made can be found in the branch ‘template-fill-blog’ which you can get to here.

However, before we get to the code, we need to generate a template (although you can use the one in the sample code).

Template Creation

Copied to clipboard

You can find a detailed description of how to set up a template in the article about creating documents using React, since the process is generic irrespective of language. Rather than repeat everything, we will just look at the simpler parts of the process.

The template is just a Word DOCX file, although other Office formats are also supported – it can contain any formatting, and any amount of text that you want. Furthermore it can contain any number of tags that will be replaced with real data when the PDF is created. 

Blog image

Figure 3 - Part of a typical report template DOCX file.

The tags typically start and end with two curly braces (sometimes known as “mustache” braces), but alternative delimiters can be specified if necessary. You can find out more by asking on Discord.

In the image above there are a number of tags marked, for example {{COMPANYNAME}}{{CUSTOMERNAME}} and {{total}}.

When we run the template these will simply be substituted with values from the JSON data source that match these names.

Setting Up a Multi-row Table

Copied to clipboard

Tables can be specified in two ways, either using the Loops syntax, or within the data.

Table specification within the data is mentioned only in passing since it is not recommended; it requires more complicated JSON data structure and increases the risk of generating a messed-up document. If you really want to know more then you can do so here.

As an example of the recommended loop syntax, the sample template contains a table that describes the sale of fruit. Within the template there is a normal Word table, but in one of the rows the following tags are present:

  • {{loop rows}}{{item}} 
  • {{item_qty}} 
  • {{item_price}}
  • {{item_total}}{{endloop}},

and in the bottom right cell is the tag {{total}}

The {{loop}}{{endloop}} syntax indicates that there may be multiple rows of data, the name ‘rows’ in this case is used to map where the data should come from in the JSON file, and the other tags indicate what should go into each cell within the row.

Blog image

Figure 4 - A table specified with the loop syntax.

A benefit of this syntax is that the table can have cells that are not populated from the same JSON data item – perhaps from a different JSON object, or with static text. It is also easy to see in the template which tags are in which column, which is essential if you decide to change the layout of the table.

Blog image

Figure 5 - The modified table, an extra column has been added with static text, and the order of “QTY” and “PRICE” have been switched, but it is easy to see that the tags are still correct.

Using Conditional Data

Copied to clipboard

You can find information here about how to specify conditional tags. These allows some parts of the text to be deliberately hidden, or shown, depending on what is present in the data. Conditionals, in this case for the key cond,

  • start with {{if cond}},
  • end with {{endif}} 
  • and support {{else}}

The JSON value corresponding to the condition key (cond in this example) is converted to a Boolean. It is evaluated as false when either:

  • The key is not present in the JSON
  • The value is false, "", 0, or null

The conditional key also supports operators.

New in SDK Version 10.4 is the equal operator, and you can read more about this in the blog for Document Generation Using React which has the same general principles as Angular.

OK, we will leave the template there, and next look at the data.

THE JSON Data Structure

Copied to clipboard

The text substitution API should be supplied with a JSON dictionary, where each template tag within the template matches a key within the dictionary. The content of the JSON values can be text, images, structured input (html and markdown) or objects.

Please see here for a detailed description of the JSON file requirements.

For our simple template though, the sample code contains the following hard-coded data

const jsonData = { 
    COMPANYNAME: 'Apryse', 
    CUSTOMERNAME: 'Huw Dickens', 
    CompanyAddressLine1: '838 W Hastings St 5th floor', 
    CompanyAddressLine2: 'Vancouver, BC V6C 0A6', 
    CustomerAddressLine1: '123 Main Street', 
    CustomerAddressLine2: 'Vancouver, BC V6A 2S5', 
    Date: {html:"<span style='color: red'><b>Nov 5th, 2023</b></span>"}, 
    ExpiryDate: Nov 15th, 2023', 
    QuoteNumber: '134', 
    WEBSITE: 'www.apryse.com', 
      rows: 
[{'item':'Apples','item_qty':'3','item_price':'$5.00','item_total':'$15.00'}, 
        {'item':'Oranges','item_qty':'2','item_price':'$5.00','item_total': '$10.00'}], 
    days: '30', 
    total: '$25.00', 
  }; 

Most of the layout is self-explanatory, but let’s look at two specific areas:

Handling Rows in the Document

Copied to clipboard

The recommended method, of specifying cell tags in the template uses the data from rows. This is followed by an array that contains an object for the values that should be placed into a single row.

Blog image

Figure 6 - Part of the final report - showing the multi-line table after it has been populated with data.

Controlling Style from the Data – Structured Input

Copied to clipboard

Much of the sample data is just text, but the value for “date” is specified as html. Using html (or markdown) allows some formatting to applied to the template based on the data and provides the ability to add paragraphs, headings, and styling.

Generating the Final Document

Copied to clipboard

Everything that we have seen so far – template generation and JSON format - is platform-independent, and the results will be the same whether the actual conversion occurs within the browser or server side.

The actual process of document generation does have minor, platform specific, variations, however. With document generation supported on UWP, Android, Linux, macOS and Windows, as well as the Web, there are many opportunities for you to use this technology. Please check out the documentation for the specific language that SDK-DocGen supports.

This article is primarily about generating documents in the browser using Angular and in this case the actual document generation requires nothing more then calling ‘apply Template Values’with the JSON data once the document has loaded.

  documentViewer.addEventListener('documentLoaded', () => { 
        documentViewer.getDocument().applyTemplateValues(this.jsonData); 
  }); 

The Apryse SDK-DocGen system then replaces each of the tags in the template with data from the JSON object, wherever possible, including iterating through tables, and produces a PDF. 

Blog image

Figure 7 - The generated document.

It really is that simple. One line of code takes the template, merges it with the JSON data, and creates a document.

In the sample code, filling the template is done as soon as the document loads, but it could be done in response to a button click event, or whatever other scenario fits your business needs.

Beyond DOCX – Support for Other File Types

Copied to clipboard

The SDK DocGen mechanism always creates a PDF, and it is extremely good at doing that.

While the most commonly used template format is DOCX, the system also works with PowerPoint and Excel files, including the old style DOC, XLS and PPTfile types.

In each case the tags that are to be substituted are marked in exactly the same way.

Blog image

Figure 8- A PowerPoint template, and the resulting PDF. Note how the formatting of Data is being controlled from the data, over-riding the template styling.

Blog image
Blog image

Figure 9 - A multi-sheet Excel template and the resulting PDF.

Blog image

When Should Apryse SDK-DocGen be Used?

Copied to clipboard

This system is great where the data source is JSON (or can easily be converted into JSON), the required document format is PDF, and its structure is relatively straightforward. This mechanism is also a great solution for use with Appian or Salesforce – with no external libraries being required.

One of the disadvantages of this system, however, is that a change in the data source will probably require help from a developer. For example the data source for a RESTful API might change, or the structure of data coming from a reporting system might need to be updated. In the case of Structured Input (formatting via html and markdown), if the formatting needs to be modified then that would likely also need developer help. As such, if your data source or complex formatting is likely to change then Apryse Fluent may be a better fit.

Similarly, in any situation where your use-case isn’t supported (such as requiring complex conditional formatting, charts then Apryse Fluent will be able to take you much further. Currently DocGen only supports generation of PDFs and PNGs, but direct creation of Word (DOCX) documents will be available soon.

Conclusion

Copied to clipboard

We have seen how we can create templates in a familiar environment that can support text of initially unknown length by behaving like Word documents – adjusting the layout of the document to work around the text. We have also seen how we can add logic to the template so that some parts of the document are only shown if some data condition is true.

And we have seen how this can all be done in the browser without the need for Office to be installed. Furthermore, if you want to extend the document generation in some way – perhaps developing a desktop app, or providing server-side processing, then the information that is included in this article is a great basis for taking those next steps.

When you are ready to get started, see the documentation for the SDK to help you to get up to speed quickly. Don’t forget, you can also reach out to us on Discord if you have any issues.

Sanity Image

Roger Dunham

Share this post

email
linkedIn
twitter