Pre-Purchase Insights: Everything you need to know before you buy.
By Adam Pez | 2022 Oct 31
When building a viewer application, organizations seek a PDF rendering library to parse, render, and display PDFs in an application. Ideally, you select a component to give you a stable platform on which to build a successful, long-term solution, whether for a commercial product, a quick PoC, or an internal enterprise application.
But of the several developer strategies and components available, which one is best for your project?
This is your guide to selecting a PDF rendering and display engine. We’ll lay out your options, then answer the most frequently asked questions that arise when choosing a backbone for your PDF viewer project.
A PDF rendering engine or “core” is the foundational piece in your viewer application architecture; it makes PDF files accessible for viewing and manipulation in an interactive workflow. Choosing the right one for your needs is therefore key, because the core serves as the cornerstone of everything an application does.
This guide discusses a few different PDF engines: some offered by commercial vendors, others available as free downloads. We’ll use Wikipedia to lay down ground terminology.
Open source is source code made available to everybody to modify, enhance, and distribute.
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized software development model that encourages open collaboration.
Proprietary software, on the other hand, is non-free source code that only an organization, person, or team can create, edit, inspect, or change.
Proprietary software, also known as non-free software or closed-source software, is computer software for which the software's publisher or another person reserves some licensing rights to use, modify, share modifications, or share the software, restricting user freedom with the software they lease. It is the opposite of open-source or free software. Non-free software sometimes includes patent rights.
With definitions out of the way, let’s create some library categories to consider:
When it comes to your PDF technology needs, open source can serve as a practical and proven starting point.
A popular option is PDFium, Google’s fork of Foxit’s PDF viewer in C++. PDFium is used today in Chrome and Windows Edge; SaaS vendors such as Dropbox use it for PDF previews, with a server-side renderer.
The benefits of these two engines are not hard to see — both PDFium and PDF.js are distributed under permissive licenses (MIT and Apache 2.0, respectively). So, they make PDF rendering freely accessible to developers.
Using these libraries directly from the repo, organizations also gain the benefits of open source:
Unmodified open source is a great entry level option. But when your users start requesting more features or improved rendering performance, you’ll need to consider other options that offer more than just the ability to read PDFs.
Consider other options when you need:
Using unmodified open source brings the disadvantage of bad PDFs, which introduce problems that interrupt user productivity.
[Parsing & extracting] is relatively straightforward until you get to bad PDFs. There are a lot of bad PDFs out there that don't follow the specification. A lot of our code is going back to handle these strange cases.
Former Mozilla and PDF.js Developer Brendan Dahl
PDFs are an incredibly complex file format; this is especially so given that a PDF can be generated a hundred different ways, all of which a renderer needs to handle gracefully.
LinkedIn Learning developer working with PDF.js
PDFs may be malformed, corrupted, or memory intensive — especially on mobile devices and in a web browser. A few problems include:
Customers in high-pressure industries need a professional, commercial rendering engine that will sidestep these issues, and they need engineers to make fixes quickly when issues arise. In contrast, open-source communities do not treat bugs with the same urgency, and you can wait months for fixes and years for requested features.
To mature their solutions, organizations can choose to continue to work with open source. You can modify open source, adding features, fixing bugs, and tuning performance.
There are two broad pathways to modify the engine source code.
Each path has its perks — and tradeoffs. For commercial software developers, both present a Catch-22.
On the one hand, contributing to the community ensures you continue to benefit from community feedback and testing on what you add to the engine. But as a business, you’d be giving away competitive advantage if you invest into PDF specialization.
The other option is to fork the library and add your additions privately, or as part of a new community. Both PDF.js and PDFium, distributed under permissive licenses, allow this.
After forking, you need to change the name of your library. And forking fragments the community.
There is a strong social pressure against forking projects. It does not happen except under plea of dire necessity, with much public self-justification, and requires re-naming.
Eric Raymond writes in ‘Homesteading the Noosphere’ in "The Cathedral & the Bazaar”
Forking weakens the value proposition of open source:
Which brings us to our third and final category — a proprietary PDF SDK engine. This is a huge investment when built from the ground up, as it takes years of development and continuous spending on testing and audits to ensure security and UX. Be prepared for the sticker price of such an engine to reflect the significant development investment.
A developer team can build up document format expertise, which they can then pass on to clients in the form of:
In addition, a proprietary PDF SDK can support your most advanced document processing needs, scaling as your needs grow. For example, with the Apryse SDK, we’re able to provide customers:
Throughout this guide — you've probably been asking: which is better for me, a proprietary or open-source engine? It’s a tough question, as no one size fits all. Whether you go with a proprietary PDF SDK engine or open-source library depends on the magnitude and longevity of your project, as well as your specific requirements.
At Apryse, we deal with the full spectrum of document management professionals. Our products include iText — an open-source PDF processing library that we distribute under a dual license arrangement AGPLv3 — and a proprietary PDF SDK. We see the benefits of open source in our own iText product. We also understand when customers need to reap all the benefits of a proprietary platform: code base stability and responsive support, developer experience, and cutting-edge features.
We (and thousands of customers) are huge fans and advocates of our own cross-platform PDF SDK, built from ground up and refined over the last 20+ years. I could go on and rattle off a brag sheet of its feature specs and logos. However, results speak for themselves, and our customers also speak for us.
Visit our WebViewer showcase to see what a world-class PDF SDK can do for your project.
And when you’re done, please drop us a line and we'd be happy to discuss your project and document technology needs.