Available Now: Explore our latest release with enhanced accessibility and powerful IDP features

Open Source or Proprietary — What PDF Viewer Engine is Right for My Application?

By Adam Pez | 2022 Oct 31

Sanity Image
Read time

7 min

When building a viewer application, organizations seek a PDF rendering library to parse, render, and display PDFs in an application. Ideally, you select a component to give you a stable platform on which to build a successful, long-term solution, whether for a commercial product, a quick PoC, or an internal enterprise application.

But of the several developer strategies and components available, which one is best for your project?

This is your guide to selecting a PDF rendering and display engine. We’ll lay out your options, then answer the most frequently asked questions that arise when choosing a backbone for your PDF viewer project.

Defining PDF Rendering Library Options

Copied to clipboard

A PDF rendering engine or “core” is the foundational piece in your viewer application architecture; it makes PDF files accessible for viewing and manipulation in an interactive workflow. Choosing the right one for your needs is therefore key, because the core serves as the cornerstone of everything an application does.

This guide discusses a few different PDF engines: some offered by commercial vendors, others available as free downloads. We’ll use Wikipedia to lay down ground terminology.

Open source is source code made available to everybody to modify, enhance, and distribute.

Quote

Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized software development model that encourages open collaboration.

Wikipedia

Proprietary software, on the other hand, is non-free source code that only an organization, person, or team can create, edit, inspect, or change.

Quote

Proprietary software, also known as non-free software or closed-source software, is computer software for which the software's publisher or another person reserves some licensing rights to use, modify, share modifications, or share the software, restricting user freedom with the software they lease. It is the opposite of open-source or free software. Non-free software sometimes includes patent rights.

Wikipedia

With definitions out of the way, let’s create some library categories to consider:

  1. Unmodified Open Source
  2. Modified Open Source
  3. Full Proprietary Engine

1. Unmodified Open Source — For Everyday PDF Reading

Copied to clipboard

When it comes to your PDF technology needs, open source can serve as a practical and proven starting point.

A popular option is PDFium, Google’s fork of Foxit’s PDF viewer in C++. PDFium is used today in Chrome and Windows Edge; SaaS vendors such as Dropbox use it for PDF previews, with a server-side renderer. 

Apryse's Director of Product, Andrey compares JavaScript PDF viewer libraries, with samples on GitHub

PDF.js is another popular open-source option. Created by Mozilla and now maintained by its community, PDF.js loads, renders and displays PDFs using JavaScript. It is used by many companies, especially startups, to add interactive PDF viewing to a web application or website.

Benefits of Unmodified Open-Source Libraries

The benefits of these two engines are not hard to see — both PDFium and PDF.js are distributed under permissive licenses (MIT and Apache 2.0, respectively). So, they make PDF rendering freely accessible to developers.

Using these libraries directly from the repo, organizations also gain the benefits of open source:

  • Community feedback for enhancing and improving the code base for PDF rendering and viewing
  • Community resources for developing and testing PDF rendering and viewing

When is Unmodified Open Source Not Enough?

Unmodified open source is a great entry level option. But when your users start requesting more features or improved rendering performance, you’ll need to consider other options that offer more than just the ability to read PDFs.

Consider other options when you need:

  • Additional file formats such as MS Office Excel, PowerPoint, and Word
  • Additional platforms (e.g., Android, iOS, and Windows devices)
  • Professional workflow capabilities, like annotations, comparison, template generation, editing, signing, redaction, and so on
  • Ability to control the UX, including the ability of users to download their PDFs

A Note on Bad PDFs and User Experience

Using unmodified open source brings the disadvantage of bad PDFs, which introduce problems that interrupt user productivity.

Quote

[Parsing & extracting] is relatively straightforward until you get to bad PDFs. There are a lot of bad PDFs out there that don't follow the specification. A lot of our code is going back to handle these strange cases.

Former Mozilla and PDF.js Developer Brendan Dahl

Quote

PDFs are an incredibly complex file format; this is especially so given that a PDF can be generated a hundred different ways, all of which a renderer needs to handle gracefully.

LinkedIn Learning developer working with PDF.js

PDFs may be malformed, corrupted, or memory intensive — especially on mobile devices and in a web browser. A few problems include:

  • Incorrect fonts and rotations
  • Imprecise vector lines
  • Color errors in branded materials
  • Text selection and highlight accuracy errors
  • Performance problems when scrolling, panning, or zooming on a page
  • Crashes or freezes due to memory-intensive files

Customers in high-pressure industries need a professional, commercial rendering engine that will sidestep these issues, and they need engineers to make fixes quickly when issues arise. In contrast, open-source communities do not treat bugs with the same urgency, and you can wait months for fixes and years for requested features.

2. Modified Open Source: Should You Commit or Fork?

Copied to clipboard

To mature their solutions, organizations can choose to continue to work with open source. You can modify open source, adding features, fixing bugs, and tuning performance.

There are two broad pathways to modify the engine source code.

  1. Commit to the original open-source project and wait for your changes to be approved.
  2. Fork the project and take ownership.

Each path has its perks — and tradeoffs. For commercial software developers, both present a Catch-22.

On the one hand, contributing to the community ensures you continue to benefit from community feedback and testing on what you add to the engine. But as a business, you’d be giving away competitive advantage if you invest into PDF specialization.

The other option is to fork the library and add your additions privately, or as part of a new community. Both PDF.js and PDFium, distributed under permissive licenses, allow this.

The Challenge with Forking

After forking, you need to change the name of your library. And forking fragments the community.

Quote

There is a strong social pressure against forking projects. It does not happen except under plea of dire necessity, with much public self-justification, and requires re-naming.

Eric Raymond writes in ‘Homesteading the Noosphere’ in "The Cathedral & the Bazaar”

Forking weakens the value proposition of open source:

  • Improvements you make no longer benefit from the security and testing provided by the browser vendors and community — you get fewer eyes on your code for feedback and testing.
  • It adds technical debt if you continue to draw from the original repo, since merging from upstream is more complicated.
  • Your changes might break when the community updates.
  • Changes might introduce rendering regressions or other bugs that will weaken the solution’s stability.

3. A Proprietary PDF SDK Engine

Copied to clipboard

Which brings us to our third and final category — a proprietary PDF SDK engine. This is a huge investment when built from the ground up, as it takes years of development and continuous spending on testing and audits to ensure security and UX. Be prepared for the sticker price of such an engine to reflect the significant development investment.

A Proprietary PDF SDK Engine Means No Shortcuts

A developer team can build up document format expertise, which they can then pass on to clients in the form of:

  1. Rendering performance on the most demanding documents, including highly technical, vector-based documents and on all platforms, including mobile and web. 
  2. Accuracy when dealing with complicated color models, including CMYK in RGB-based browsers. 
  3. Configurability and control – anything from specifying color transforms, specific font substitution behavior, caching, and more.

Meeting Your Most Advanced Document Processing Needs

In addition, a proprietary PDF SDK can support your most advanced document processing needs, scaling as your needs grow. For example, with the Apryse SDK, we’re able to provide customers:

  • The ability to dynamically load a variety of documents, including PDF, MS Office files, and images in a web or mobile app viewer, no conversion servers required
  • Powerful client-side rendering and viewing for your most demanding PDFs, including huge vector drawings sourced from desktop CAD programs
  • Support for all mobile platforms (Android, iOS, Windows), JS frameworks, and cross-platform languages (React Native, Flutter, and Xamarin) to streamline development
  • Advanced document processing right in a web or mobile app client, for extra security (true PDF redaction, true PDF editing, and much, much more)

The Bottom Line - Which is Best for Your Project?

Copied to clipboard

Throughout this guide — you've probably been asking: which is better for me, a proprietary or open-source engine? It’s a tough question, as no one size fits all. Whether you go with a proprietary PDF SDK engine or open-source library depends on the magnitude and longevity of your project, as well as your specific requirements.

At Apryse, we deal with the full spectrum of document management professionals. Our products include iText — an open-source PDF processing library that we distribute under a dual license arrangement AGPLv3 — and a proprietary PDF SDK. We see the benefits of open source in our own iText product. We also understand when customers need to reap all the benefits of a proprietary platform: code base stability and responsive support, developer experience, and cutting-edge features.

We (and thousands of customers) are huge fans and advocates of our own cross-platform PDF SDK, built from ground up and refined over the last 20+ years. I could go on and rattle off a brag sheet of its feature specs and logos. However, results speak for themselves, and our customers also speak for us.

Visit our WebViewer showcase to see what a world-class PDF SDK can do for your project. 

And when you’re done, please drop us a line and we'd be happy to discuss your project and document technology needs.

Sanity Image

Adam Pez

Share this post

email
linkedIn
twitter