Pre-Purchase Insights: Everything you need to know before you buy.

Generating Documents and Reports from DOCX Templates and JSON using Apryse and C++

By Roger Dunham | 2023 Nov 17

Sanity Image
Read time

9 min

Summary

Copied to clipboard

Apryse offers two document creation systems: "Fluent," a low-code solution allowing diverse document formats without Office installation, and "SDK DocGen," a high-code option requiring JSON data. This article focuses on SDK DocGen within a C++ application. Unlike Fluent, DocGen exclusively uses JSON and supports Office documents over PDFs as templates. Read on for a life example of dynamic data entry based on DOCX templates. The process involves creating a template, generating JSON data, and utilizing Apryse SDK to replace tags for document creation.

Introduction

Copied to clipboard

Apryse has two distinct systems for creating documents and reports from templates. In both cases the templates are Office documents, but in neither case is there a need for Office to be installed to generate the final document from the template.

The first mechanism is a “low-code” solution – “Fluent”, and there have been several recent articles on using this versatile and powerful system. It can be used to create not just PDFs, but also other document formats, for example DOCX or PPTX or HTML. Furthermore, the template designer means that this system is suitable for anyone with moderate knowledge of Word, allowing you to eliminate lengthy development work for your templates

In this article though, we will look at the “high-code” option, SDK DocGen, which is based directly on the Apryse SDK. It can be used entirely within the browser, but in this article, we will look at how it can be used within a C++ application running on a local machine. The information in this article can be extended to performing document generation on other platforms.

A significant difference between DocGen and Fluent is that DocGen requires the data to be in JSON format, whereas Fluent can access data from a huge range of data sources, including JSON, XML, SQLServer, and OData.

In this article, we will:

Why it is Better to use Office Documents than PDFs as Templates

Copied to clipboard

A quick search of the internet reveals some document generation systems use a PDF as a template, then substitute content within the PDF. There is nothing wrong with generating documents in that way if it does everything that you need. However, when used with real data it is likely that you will soon find limitations.

Blog image

Figure 1 - A PDF created by substitution of text from a PDF template. There was not enough space, so some text is missing.

You can read more about why using a PDF as a template is problematic here.

Using DOCX files as a template solves this problem. The Apryse SDK provides a mechanism for viewing and editing Word documents without the need for Office to be installed. In fact, it is the library that provides the functionality behind the Xodo online editing tool.

With its ability to reflow text in just the same way that Word does, as well as its support for complex text and paragraph formatting, the Apryse SDK is an ideal tool for substituting text-markers (which are known as tags) in a template, in order to automatically generate high-quality, accurate documents that contain up-to-date data.

Seeing Document Generation in Action

Copied to clipboard

Before we go any further, let’s look at a live example of document generation from a DOCX template. You can choose any template, and the program will query it for tags, and use them to generate a data-entry form, which in turn is used to populate the template.

Blog image

Figure 2 - The online sample showing a dynamically generated data entry area

So, let’s see how we would do this in practice.

Overview of the Document Generation Process – From DOCX to PDF

Copied to clipboard
  1. Create a template in DOCX format. This can be done using markers (the tags that were mentioned earlier) to indicate where text should be located. In addition to single word replacement, table creation is supported, as is the ability to have text shown, or not shown, depending on the actual data. The method for creating templates is well documented, and can be performed by anyone that is moderately skilled at Office.
  2. Generate a JSON object that contains the data that should be copied into the final document. The origin of that data is up to you – whether it is collected from a file, user input (which is the case in the example just mentioned), a database or even a RESTful API.
  3. Use the Apryse SDK to replace the tags in the template with the data in order to create the final document.

In this article, we will look at sample code written in C++ which is included when you download the SDK.

Seeing the C++ Sample in Action

Copied to clipboard

The Apryse SDK is available many different languages and for macOS, Linux and Windows. In this article we will look primarily at the Windows version. The other platforms have minor differences – if you have problems then please reach out to us on Discord.

Head over to https://dev.apryse.com/windows, scroll down a little and download the bitness that you require.

Blog image

Figure 3 - How to get the 64-bit version of the Apryse SDK

The downloaded SDK is a zip file on Windows and macOS and as a tarball on Linux. In any event, extract the archive which, on Windows, will create a folder called PDFNetC64.

Blog image

Figure 4 - The contents of the downloaded archive on Windows

This folder contains not just the SDK itself, but documentation and a folder full of samples that demonstrate the huge range of functionality supported by Apryse. For now, we will look at the OfficeTemplateTest sample.

Blog image

Figure 5 - Just a few of the samples that are included with the SDK. In this article we are looking at OfficeTemplateTest.

You have two options for running samples. Either you can open the file Samples_VC20XX.sln in the root of the Samples folder (where XX matches your version of Visual Studio), which will load all of the projects.

Alternatively, you can navigate to the C++ folder of the OfficeTemplateTest project and open the .vxcprojfile that matches your version of Visual Studio.

Note: currently Visual Studio 2019 is the most recent version of Visual Studio for which a dedicated project is available. It is, however, straightforward to open that project in later versions, but if you run into problems please reach out to support.

Before running the sample, you will need an Apryse Trial key which you can get here. This then needs to be pasted into the file LicenseKey.h.

An easy “gotcha” which could occur if you opened just a single project rather than the entire solution, is a mismatch in bitness. If the selected platform does not match the bitness of the downloaded SDK bitness then linker errors will occur. By default, the platform is 32-bit up to and including Visual Studio 2019.

Blog image

Figure 6 - Typical errors if the bitness of the project does not match the downloaded SDK.

Thankfully that is an easy fix – just change the bitness in the Solution Platforms dropdown, then run the sample again.

Blog image

Figure 7 - The corrected bitness

Running the project will open a console window, and, after a few seconds, the document SYH_Letter.pdf will be saved.

Blog image

Figure 8 - The console window indicating that the sample has completed successfully

Let’s open that document.

Blog image

Figure 9 - The newly generated document, shown here in the online PDF viewer xodo.com

That’s pretty cool – with very little effort we created a new document. What isn’t obvious is that we could create another one whenever the data changes, so that what is produced is always up to date. Let’s look at how we did that.

In the folder PDFNetC64\Samples\TestFiles you will find a file called SYH_Letter.docx.

Blog image

Figure 10 - The contents of the TestFiles folder that ships with the SDK

This is the template that was used to create the PDF. If you open it in Word (or any other editor that supports DOCX) then you can see that it is just an ordinary DOCX file.

What makes it special is the tags – the pieces of text, such as {{dest_surname}} – which indicate where content should be replaced when generating the final document.

Blog image

Figure 11 - Part of the report template DOCX file, showing the tags

A tag, for example {{sender_name}}, typically starts, and ends, with double curly braces (sometimes known as “mustache” braces). The tag markers are customizable, so you can use something else if you prefer.

Although this template does not include one, it is possible to include a table in the document. The generated table would automatically expand to contain whatever number of rows are specified in the data, so there is no need to know the number of rows at the time of template creation. 

Since DocGen is supported across a range of platforms and languages, the skills that you develop working in one language are generally transferrable to another. As such, you can read more about setting up a multi-row table in the article about document generation using React and use them with C++.

It is also possible to specify conditional text, for example, to show one block of text if the data contains some feature (for example a field for “overdue”) and a different block of text, or none at all, if the data doesn’t contain that feature. You can find information here about how to set up conditional text, but an example is shown below.

Blog image

Figure 12 - An example of conditional text. The statement about rent arrears will only be shown if the expression 'overdue' is true.

OK, we will leave the template there, and next look at the data.

The JSON Data Structure

Copied to clipboard

The text substitution API needs to be supplied with a JSON dictionary, where each template tag within the template matches a key within the dictionary. The content of the JSON values can be text, images, structured input (html and markdown) or objects.

Please see here for a detailed description of the JSON file requirements.

While SDK DocGen does require the data to be in JSON format, the way that the JSON is generated is up to you. It could be hard-coded, acquired from user input, a local file, a database, or even from a web call.

For our simple template though, the sample project uses the following hard-coded data:

UString json( 

"{\"dest_given_name\": \"Janice N.\", \"dest_street_address\": \"187 Duizelstraat\"," 

"\"dest_surname\": \"Symonds\", \"dest_title\": \"Ms.\", \"land_location\": \"225 Parc St., Rochelle, QC \"," 

"\"lease_problem\": \"According to the city records, the lease was initiated in September 2010 and never terminated\"," 

"\"logo\": {\"image_url\": \"" + input_path + "logo_red.png\", \"width\" : 64, \"height\" : 64}," 

"\"sender_name\": \"Arnold Smith\"}" 

); 

Most of the layout is self-explanatory, but the handling of the logo requires a mention.

Handling Images in the Document

Copied to clipboard

The logo is specified as an image_url, including its path, width, and height. When the document is generated, the image will be loaded into the final document.

Blog image

Figure 13 - Prior to the logo tag being replaced

Blog image

Figure 14 - Part of the final report – showing how the logo tag has been replaced

Controlling Style from the Data – Structured Input

Copied to clipboard

While most of the data is just text, which will ultimately be displayed using the formatting form the template, it is also possible to include styling information in the data – either as html or markdown. 

Date: {html:"<span style='color: red'><b>Oct5th, 2023</b></span>"},

In this example, the formatting of the date tag will be red and bold, rather than just text.

Blog image

Figure 15 - Prior to the tags being replaced

Blog image

Figure 16 - Part of a different example template showing two tags – for Date and Expiry Date. In the final document the styling of the {{Date}} has been changed since it is specified within the data.

This is a powerful mechanism, and can be used to add paragraphs, headings, and styling.

Generating the Final Document

Copied to clipboard

Everything that we have seen so far – template generation and JSON format – is platform-independent. In fact, the results will be the same whether the actual conversion occurs within the browser, in the cloud or on a local machine.

With document generation supported on UWP, Android, Linux, macOS and Windows, as well as the Web, there are many opportunities for you to use this technology. The actual process of document generation does, however, have minor, platform specific, variations – so reach out to us if you need a hand getting started.

In the case of generating documents using C++, the actual document generation requires nothing more than calling FillTemplateJson with the JSON data.

// Create a TemplateDocument object from an input office file. 

TemplateDocument template_doc = Convert::CreateOfficeTemplate(input_path + input_filename, NULL); 

 

// Fill the template with data from a JSON string, producing a PDF document. 

PDFDoc pdf = template_doc.FillTemplateJson(json); 

 

// Save the PDF to a file. 

pdf.Save(output_path + output_filename, SDF::SDFDoc::e_linearized, NULL);

FillTemplateJson causes the Apryse SDK-DocGen system to produce a PDF by substituting each of the tags in the template with data from the JSON object, including iterating through tables. This is done entirely within the Apryse code – there is no need for Word to be installed.

It really is that simple. One line of code takes the template, merges it with the JSON data, and creates a document. All that is still needed is to save the resulting PDF.

Beyond DOCX – Support for Other File Types

Copied to clipboard

The SDK DocGen mechanism creates a PDF (although this can be saved as a PNG, and the ability to save as Office documents is in development), and it is extremely good at doing that.

While the most commonly used template format is DOCX, the system also works with PowerPoint and Excel templates, including the old style DOC, XLS, and PPT file types.

In each case, the tags that are to be substituted are marked in exactly the same way.

Blog image

Figure 17 - Prior to the date being replaced

Blog image

Figure 18 - A PowerPoint template and the resulting PDF

Blog image

Figure 19 - The multi-sheet Excel template

Blog image

Figure 20 - The resulting PDF

When Should Apryse SDK-DocGen be Used?

Copied to clipboard

This system is great where the data source is JSON (or can easily be converted into JSON), the required document format is PDF (although as mentioned above, the ability to save as Office is in development), and the document structure is relatively straightforward. This mechanism is also a great solution for use with Appian or Salesforce – with no external libraries being required.

One of the disadvantages of this system, though, is that it is “high-code”. A change in the data source will probably require help from a developer. For example, the data source for a RESTful API might change, or the structure of data coming from a reporting system might need to be updated. In the case of Structured Input (formatting via html and markdown), a change to the formatting would likely need developer help. As such, if your data source (or Structured Input based formatting) is likely to change then Apryse Fluent may be a better fit.

Similarly, in any situation where your use-case isn’t supported (such as requiring complex conditional formatting or charts) then Apryse Fluent will be able to take you much further. 

Conclusion

Copied to clipboard

We have seen how we can create templates in a familiar environment,that can support text of initially unknown length by behaving like Word documents – adding new lines or overflowing onto the next page as dictated by the data. We have also seen how we can add logic to the template so that whether or not some parts of the document are only shown depends on some data condition.

We have also seen how this can all be done without the need for Office to be installed on the machine where document generation is occurring. Furthermore, if you want to extend the document generation in some way – perhaps developing a web app, where conversion occurs entirely within the browser, then the information that is included in this article is a great basis for taking those next steps.

When you are ready to get started, see the documentation for the SDK to help you to get up to speed quickly. Don’t forget, you can also reach out to us on Discord if you have any issues.

Sanity Image

Roger Dunham

Share this post

email
linkedIn
twitter