Available Now: Explore our latest release with enhanced accessibility and powerful IDP features
By Roger Dunham | 2023 Nov 01
8 min
Tags
JSON
template
docx
report
ruby
Explore a thorough document generation solution in Ruby that encompasses every aspect, including template creation using DOCX and JSON, configuring multi-row tables, and producing the final document.
Automated document generation simplifies and accelerates document creation, producing a wide range of documents accurately and efficiently. It saves time, ensures consistency, and reduces errors, proving valuable for businesses in diverse sectors that require customized and high-volume document production.
Apryse has two distinct systems for creating documents and reports from templates:
In both cases the templates are Office documents, but neither method uses Office to actually generate the final document from the template.
In this article, we will look at how to use the Apryse DocGen document generation system from Ruby. Although Ruby can be used from Windows, in this article it will be used from Ubuntu 22.02 running with WSL (Windows Subsystem for Linux).
We will:
While it is possible to edit PDFs directly, and some document generation systems that you can find on the internet do that, it is likely that when used with real data, or anything other than a trivial document, then you will soon find limitations, such as the new text not fitting into the available space.12
Figure 1 - A PDF created by substitution of text from a PDF template. There was not enough space, so some text is missing.
Using DOCX files as a template can solve this problem. Word is great at updating the page layout of the page when a line of text at suddenly flows onto a second line, or when a new item needs to be appended to a numbered list.
But you don’t need Word to do this - the Apryse SDK is also great at reflowing text, making it an ideal tool for substituting text-markers, which we call 'tags', in a template.
Before we go any further, let’s look at a live example of document generation from a DOCX template.
The sample is written in JavaScript, but the principles of document generation are language agnostic. You can choose any template, and the program will query it for tags, and then use those to dynamically create a data entry form, which is then used to populate the template.
Figure 2 - The online sample showing a dynamically generated data entry area.
It’s a great solution to creating documents dynamically, but, in fact, the data could come from sources other than user input – perhaps from a database, or even from a RESTful API call.
Before we move on, let’s look at the overview of creating a document from a template.
The Apryse SDK is not directly available for Ruby on Linux, but it is very easy to generate a wrapper library by following the instructions about getting started with Ruby or by reading this article.
Once built, you will have a Ruby Library, along with a wealth of samples that illustrate the SDK functionality.
You will need to get an Apryse Trial key and then update the file LicenseKey.rb.
Figure 3 - Entering your license key information into the file LicenseKey.rb.
Finally, head over to the sample folder OfficeTemplateTest, and within a terminal enter RunTest.sh if you are using Linux or macOS (or RunTest.batif you are using Windows).
After a few seconds you will see that the processing has completed, and a file called SYH_Letter.pdf will have been saved.
Figure 3 - Typical output from running the document generation sample.
Learn how to generate PDFs using Ruby on Rails.
The template in this example is just a Word DOCX file (although other Office formats can be used). It can contain any formatting, and any amount of text that you want. Furthermore, it can contain however many tags that you want to be filled when the PDF is created.
It also doesn’t have to be used on the machine where it was created– so you are free to use Office on Windows (or any other editor that works with DOCX format), then copy that file across to the Linux machine.
The easiest way to understand how to use a template is work with one. We will use the file SYH_Letter.docx in the Samples/TestFiles folder.
Figure 4 - The location of the template file that we will work with.
If you open that file, you will see that it contains text (it could also contain images, tables and all of the other things that DOCX supports) and text that is to be substituted (the tags).
Figure 5 - Part of the report template DOCX file.
The tags, for example {{dest_given_name}} and {{dest_surname}} start and end with two curly braces (sometimes known as “mustache” brackets). In the unlikely event that you need to use mustache brackets as actual text within your document, then it is possible to specify a different delimiter. You can find out more by asking on Discord.
During document generation, each tag will be substituted withthe value from the JSON data source that matches the names.
While we could just use the template as it is, lets add a table to it, as that illustrates an important feature of SDK-DocGen – the ability to add multiple items to the document, when the number of items is not known at the time of template creation. There are two ways to specify tables. While the table structure can be defined within the data, this is not recommended, since it results in more complicated data structures, has less flexibility, and increases the risk of producing a document where data and column headers are out of sync. The recommended method is to specify the table within the template using the Loops syntax.
As an example of this, create a new empty table in the document with three rows of four columns.
The top row will be used for column titles. The middle row will be used for the actual data, so add the following tags to the cells:
and in the last cell of the bottom row add the tag {{total}}, and as an example of formatting, set the text color to purple.
The {{loop}}…{{endloop}} syntax indicates that there may be multiple rows of data, the name ‘rows’ (in this example) is used to map where the data should come from in the JSON file, and the other tags indicate what should go into each cell within the row.
Figure 6 - A table specified with the loop syntax.
A benefit of this syntax is that the table can have cells that are not populated from the same JSON data item – perhaps from another object, or with static text. There is also a visible mapping of which data will be in which column, and it is easy to verify that it agrees with the column title. That is extremely useful if there is a need to change the order of columns, since it is easy to check that the columns in the table would still contain data that matched the titles.
Note that only one row in the table needs to contain tags. We will see, shortly, how multi-line data is stored within the JSON data object, and how that controls the creation of the number of data rows within the document generation mechanism.
Apryse DocGen also supports the ability to show or display text based on some condition within the data (for example if a value is present).You can read more about how to do that in the article about Document Generation using React.
OK, we will leave the template there, and move onto looking at the data.
The text substitution API requires a JSON dictionary, where each template tag within the template matches a key within the dictionary. The content of the JSON values can be text, images, structured input (html and markdown)or objects.
Please see here for a detailed description of the JSON file requirements.
For our simple template, the sample code already contains some hard-coded data, and we will add a little more to populate the table.
{
"dest_given_name": "Janice N.",
"dest_street_address": "187 Duizelstraat",
"dest_surname": {"html":"<span style=\"color: red\"><b>Symonds</b></span>"},
"dest_title": "Ms.",
"land_location": "225 Parc St., Rochelle, QC ",
"lease_problem": "According to the city records, the lease was initiated in September 2010 and never terminated",
"logo": { "image_url": "%slogo_red.png", "width" : 64, "height": 64 },
"sender_name": "Arnold Smith",
"rent_increase":"200",
"rows": [
{"year":"2021","rent":"$3,000","tax":"$500","year_total":"$3500"},
{"year":"2022","rent":"$3,200","tax":"$550","year_total":"$3750"}
],
"total":"$7250"
}
Most of the layout is self-explanatory, but let’s look at two specific areas:
The recommended method for table creation - specifying cell tags in the template - uses the data from rows. This is followed by an array that contains an object for the values that should be placed into a single row, with the data for each column individually named.
Much of the sample data is just text, but the value for “dest_surname” is specified as html.
"dest_surname": {"html":"<span style=\"color: red\"><b>Symonds</b></span>"},
This is an example of how document formatting can be controlled from the data, using either html or markdown.
Figure 7 - Part of the generated document. The formatting for the surname was defined in the data, not in the template.
This mechanism can be used to alter the look of the document via the data – and offers the ability to add paragraphs, headings, and styling.
Everything that we have seen so far – template generation and JSON format - is platform-independent, and the results will be the same whether the actual conversion occurs within a browser or server side.
The actual process of document generation does have minor, platform specific, variations, however. With SDK DocGen supported on UWP, Android, Linux, macOS and Windows, as well as the Web, there are many opportunities for you to use this technology. Please check out the documentation for the specific language that SDK-DocGen supports.
This article is about generating documents using Ruby, and in this case the actual code needed to generate the document is just the following:
templateDoc = Convert.CreateOfficeTemplate(inputFile, nil)
# Fill the template with data from a JSON string, producing a PDF document.
pdfdoc = templateDoc.FillTemplateJson(json)
# Save the PDF to a file.
outputFile = $outputPath + outputFilename
pdfdoc.Save(outputFile, SDFDoc::E_linearized)
The Apryse SDK-DocGen system then replaces each of the tags in the template with data from the JSON object wherever possible, including iterating through tables, and produces a PDF.
Figure 8 - The generated document – the tags have been replaced, the surname is in red (as specified in the data) and the table has been populated, with the total value shown in purple (as specified in the template).
It really is that simple. Four lines of code takes the template, merges it with the JSON data, and creates a document, all without the need for Office to be installed.
The SDK Doc-Gen mechanism always creates a PDF, and it is extremely good at doing that.
While the most used template format is DOCX, the system also works with PowerPoint and Excel files (including the DOC, XLS and PPT file types).
In each case the tags that are to be substituted are marked in exactly the same way as we have already seen.
Figure 9 - A PowerPoint template, and the resulting PDF.
Figure 10 - A multi-sheet Excel template and the resulting PDF
This system is great where the data source is JSON (or can easily be converted into JSON), the required document format is PDF, and its structure is relatively straigh tforward. This mechanism is also a great solution for use with Appian or SalesForce – with no external libraries being required.
One of the disadvantages of this system, however, is that a change in the dat asource will probably require help from a developer. For example, in the case of Structured Input (formatting via html and markdown), if the formatting needs to be modified then that would need developer help. As such, if your data source or complex formatting is likely to change then Apryse Fluent may be a better fit.
Similarly, in any situation where your use-case isn’t supported (such as requiring complex conditional formatting, dynamically generated charts, etc.) then ApryseFluent will be able to take you much further.
Currently DocGen only supports generation of PDFs, but direct creation of Word (DOCX) documents will be available soon.
We have seen how we can create templates in a familiar environment that can support text of initially unknown length, adjusting the layout of the document to work around the text. We have also seen how multi-row data can be added into tables.
While the example used in this article is fairly simple, if you want to extend the document generation in some way – perhaps developing a desktop app, or providing browser-side processing, then what you have learnt here will be a great basis for taking those next steps.
When you are ready to take the next steps, see the documentation for the SDK to get started quickly. Don’t forget, you can also reach out to us on Discord if you have any issues.
Tags
JSON
template
docx
report
ruby
Roger Dunham
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content