Available Now: Explore our latest release with enhanced accessibility and powerful IDP features

Generating Documents and Reports from DOCX Templates and JSON using Apryse and Ruby

By Roger Dunham | 2023 Nov 01

Sanity Image
Read time

8 min

Summary

Copied to clipboard

Explore a thorough document generation solution in Ruby that encompasses every aspect, including template creation using DOCX and JSON, configuring multi-row tables, and producing the final document.

Introduction

Copied to clipboard

Automated document generation simplifies and accelerates document creation, producing a wide range of documents accurately and efficiently. It saves time, ensures consistency, and reduces errors, proving valuable for businesses in diverse sectors that require customized and high-volume document production.

Apryse has two distinct systems for creating documents and reports from templates: 

  • Fluent: A versatile system that can create PDFs, DOCX, PPTX, and HTML documents from a single template. It can fetch data from diverse sources like JSON, XML, SQL Server, and OData.
  • SDK DocGen: Built on the Apryse SDK, it operates in browsers, on servers, or as a standalone application. It uses JSON data to populate templates and generate PDFs.

In both cases the templates are Office documents, but neither method uses Office to actually generate the final document from the template.

In this article, we will look at how to use the Apryse DocGen document generation system from Ruby. Although Ruby can be used from Windows, in this article it will be used from Ubuntu 22.02 running with WSL (Windows Subsystem for Linux).

We will:

Why it is Better to use Office Documents than PDFs as Templates

Copied to clipboard

While it is possible to edit PDFs directly, and some document generation systems that you can find on the internet do that, it is likely that when used with real data, or anything other than a trivial document, then you will soon find limitations, such as the new text not fitting into the available space.12

Blog image

Figure 1 - A PDF created by substitution of text from a PDF template. There was not enough space, so some text is missing.

Using DOCX files as a template can solve this problem. Word is great at updating the page layout of the page when a line of text at suddenly flows onto a second line, or when a new item needs to be appended to a numbered list.

But you don’t need Word to do this - the Apryse SDK is also great at reflowing text, making it an ideal tool for substituting text-markers, which we call 'tags', in a template.

See Document Generation in Action

Copied to clipboard

Before we go any further, let’s look at a live example of document generation from a DOCX template.

The sample is written in JavaScript, but the principles of document generation are language agnostic. You can choose any template, and the program will query it for tags, and then use those to dynamically create a data entry form, which is then used to populate the template.

Blog image

Figure 2 - The online sample showing a dynamically generated data entry area.

It’s a great solution to creating documents dynamically, but, in fact, the data could come from sources other than user input – perhaps from a database, or even from a RESTful API call.

Before we move on, let’s look at the overview of creating a document from a template.

Overview of the Document Generation Process – From DOCX to PDF

Copied to clipboard
  1. Create a template in DOCX format. This is done using markers (which Apryse refers to as tags) to indicate where text should be located. In addition to single word replacement, tables with run-time row creation are supported, as is some conditional logic. The method for creating templates is well documented, and can be performed by anyone that is reasonably familiar with Word.
  2. Gather the replacement data to use in the generated document. This needs to be in JSON format for the template filling to occur, but where you get that data from up to you.
  3. Replace the tags with the real data using the Ruby Apryse SDK to fill the template, then display, or save, the generated PDF.

Try out the Pre-packaged Ruby Sample for Yourself

Copied to clipboard

The Apryse SDK is not directly available for Ruby on Linux, but it is very easy to generate a wrapper library by following the instructions about getting started with Ruby or by reading this article.

Once built, you will have a Ruby Library, along with a wealth of samples that illustrate the SDK functionality.

You will need to get an Apryse Trial key and then update the file LicenseKey.rb.

Blog image

Figure 3 - Entering your license key information into the file LicenseKey.rb.

Finally, head over to the sample folder OfficeTemplateTest, and within a terminal enter RunTest.sh if you are using Linux or macOS (or RunTest.batif you are using Windows).

After a few seconds you will see that the processing has completed, and a file called SYH_Letter.pdf will have been saved.

Blog image

  Figure 3 - Typical output from running the document generation sample.

Working with a Template

Copied to clipboard

The template in this example is just a Word DOCX file (although other Office formats can be used). It can contain any formatting, and any amount of text that you want. Furthermore, it can contain however many tags that you want to be filled when the PDF is created.

It also doesn’t have to be used on the machine where it was created– so you are free to use Office on Windows (or any other editor that works with DOCX format), then copy that file across to the Linux machine.

The easiest way to understand how to use a template is work with one. We will use the file SYH_Letter.docx in the Samples/TestFiles folder.

Blog image

Figure 4 - The location of the template file that we will work with.

If you open that file, you will see that it contains text (it could also contain images, tables and all of the other things that DOCX supports) and text that is to be substituted (the tags).

Blog image

Figure 5 - Part of the report template DOCX file.

The tags, for example {{dest_given_name}} and {{dest_surname}} start and end with two curly braces (sometimes known as “mustache” brackets). In the unlikely event that you need to use mustache brackets as actual text within your document, then it is possible to specify a different delimiter. You can find out more by asking on Discord.

During document generation, each tag will be substituted withthe value from the JSON data source that matches the names.

Specifying Tables Within the Template

Copied to clipboard

While we could just use the template as it is, lets add a table to it, as that illustrates an important feature of SDK-DocGen – the ability to add multiple items to the document, when the number of items is not known at the time of template creation. There are two ways to specify tables. While the table structure can be defined within the data, this is not recommended, since it results in more complicated data structures, has less flexibility, and increases the risk of producing a document where data and column headers are out of sync. The recommended method is to specify the table within the template using the Loops syntax.

As an example of this, create a new empty table in the document with three rows of four columns.

The top row will be used for column titles. The middle row will be used for the actual data, so add the following tags to the cells:

  • {{loop rows}}{{year}} 
  • {{rent}} 
  • {{tax}}
  • {{year_total}}{{endloop}},

and in the last cell of the bottom row add the tag {{total}}, and as an example of formatting, set the text color to purple.

The {{loop}}{{endloop}} syntax indicates that there may be multiple rows of data, the name ‘rows’ (in this example) is used to map where the data should come from in the JSON file, and the other tags indicate what should go into each cell within the row.

Blog image

Figure 6 - A table specified with the loop syntax.

A benefit of this syntax is that the table can have cells that are not populated from the same JSON data item – perhaps from another object, or with static text. There is also a visible mapping of which data will be in which column, and it is easy to verify that it agrees with the column title. That is extremely useful if there is a need to change the order of columns, since it is easy to check that the columns in the table would still contain data that matched the titles.

Note that only one row in the table needs to contain tags. We will see, shortly, how multi-line data is stored within the JSON data object, and how that controls the creation of the number of data rows within the document generation mechanism.

Apryse DocGen also supports the ability to show or display text based on some condition within the data (for example if a value is present).You can read more about how to do that in the article about Document Generation using React.

OK, we will leave the template there, and move onto looking at the data.

THE JSON Data Structure

Copied to clipboard

The text substitution API requires a JSON dictionary, where each template tag within the template matches a key within the dictionary. The content of the JSON values can be text, images, structured input (html and markdown)or objects. 

Please see here for a detailed description of the JSON file requirements.

For our simple template, the sample code already contains some hard-coded data, and we will add a little more to populate the table.

  { 
        "dest_given_name": "Janice N.", 
        "dest_street_address": "187 Duizelstraat", 
         "dest_surname":  {"html":"<span style=\"color: red\"><b>Symonds</b></span>"}, 
        "dest_title": "Ms.", 
        "land_location": "225 Parc St., Rochelle, QC ", 
        "lease_problem": "According to the city records, the lease was initiated in September 2010 and never terminated", 
        "logo": { "image_url": "%slogo_red.png", "width" : 64, "height":  64 }, 
        "sender_name": "Arnold Smith", 
        "rent_increase":"200", 
        "rows": [ 
            {"year":"2021","rent":"$3,000","tax":"$500","year_total":"$3500"},  
            {"year":"2022","rent":"$3,200","tax":"$550","year_total":"$3750"}
        ], 
        "total":"$7250" 
    } 

Most of the layout is self-explanatory, but let’s look at two specific areas:

Handling Rows in the Document

Copied to clipboard

The recommended method for table creation - specifying cell tags in the template - uses the data from rows. This is followed by an array that contains an object for the values that should be placed into a single row, with the data for each column individually named.

Controlling Style from the Data – Structured Input

Copied to clipboard

Much of the sample data is just text, but the value for “dest_surname” is specified as html.

  "dest_surname":  {"html":"<span style=\"color: red\"><b>Symonds</b></span>"}, 

This is an example of how document formatting can be controlled from the data, using either html or markdown.

Blog image

Figure 7 - Part of the generated document. The formatting for the surname was defined in the data, not in the template.

This mechanism can be used to alter the look of the document via the data – and offers the ability to add paragraphs, headings, and styling.

Generating the Final Document

Copied to clipboard

Everything that we have seen so far – template generation and JSON format - is platform-independent, and the results will be the same whether the actual conversion occurs within a browser or server side.

The actual process of document generation does have minor, platform specific, variations, however. With SDK DocGen supported on UWP, Android, Linux, macOS and Windows, as well as the Web, there are many opportunities for you to use this technology. Please check out the documentation for the specific language that SDK-DocGen supports.

This article is about generating documents using Ruby, and in this case the actual code needed to generate the document is just the following:

  templateDoc = Convert.CreateOfficeTemplate(inputFile, nil) 

     # Fill the template with data from a JSON string, producing a PDF document. 
     pdfdoc = templateDoc.FillTemplateJson(json) 
     
    # Save the PDF to a file. 
    outputFile = $outputPath + outputFilename 
    pdfdoc.Save(outputFile, SDFDoc::E_linearized) 

The Apryse SDK-DocGen system then replaces each of the tags in the template with data from the JSON object wherever possible, including iterating through tables, and produces a PDF. 

Blog image

Figure 8 - The generated document – the tags have been replaced, the surname is in red (as specified in the data) and the table has been populated, with the total value shown in purple (as specified in the template).

It really is that simple. Four lines of code takes the template, merges it with the JSON data, and creates a document, all without the need for Office to be installed.

Beyond DOCX – Support For Other File Types

Copied to clipboard

The SDK Doc-Gen mechanism always creates a PDF, and it is extremely good at doing that.

While the most used template format is DOCX, the system also works with PowerPoint and Excel files (including the DOC, XLS and PPT file types).

In each case the tags that are to be substituted are marked in exactly the same way as we have already seen.

Blog image

Figure 9 - A PowerPoint template, and the resulting PDF.

Blog image

Blog image

Figure 10 - A multi-sheet Excel template and the resulting PDF

Blog image

When Should Apryse SDK-DocGen be Used?

Copied to clipboard

This system is great where the data source is JSON (or can easily be converted into JSON), the required document format is PDF, and its structure is relatively straigh tforward. This mechanism is also a great solution for use with Appian or SalesForce – with no external libraries being required.

One of the disadvantages of this system, however, is that a change in the dat asource will probably require help from a developer. For example, in the case of Structured Input (formatting via html and markdown), if the formatting needs to be modified then that would need developer help. As such, if your data source or complex formatting is likely to change then Apryse Fluent may be a better fit.

Similarly, in any situation where your use-case isn’t supported (such as requiring complex conditional formatting, dynamically generated charts, etc.) then ApryseFluent will be able to take you much further. 

Currently DocGen only supports generation of PDFs, but direct creation of Word (DOCX) documents will be available soon.

Conclusion

Copied to clipboard

We have seen how we can create templates in a familiar environment that can support text of initially unknown length, adjusting the layout of the document to work around the text. We have also seen how multi-row data can be added into tables.

While the example used in this article is fairly simple, if you want to extend the document generation in some way – perhaps developing a desktop app, or providing browser-side processing, then what you have learnt here will be a great basis for taking those next steps.

When you are ready to take the next steps, see the documentation for the SDK to get started quickly. Don’t forget, you can also reach out to us on Discord if you have any issues.

Sanity Image

Roger Dunham

Share this post

email
linkedIn
twitter