Available Now: Explore our latest release with enhanced accessibility and powerful IDP features
By James Borthwick | 2013 Aug 08
9 min
Tags
tutorial
javascript
view
To see our updated 2023 WebViewer Guide, click here.
HTML5 apps offer many advantages over native ones. Web apps are:
But web apps suffer one big problem, and that’s the user experience.
Today, in 2013, even the best-crafted mobile web apps come nowhere near the quality of experience of the best native apps. In fact, with but a few exceptions, the best mobile web apps today still don’t approach the quality of the first batch of native iPhone apps from 2007.
— John Gruber, Daring Fireball
One area of the user experience where HTML5 apps have been historically weak is in their ability to display a PDF within the app. For a long time, “viewing” a PDF on the web meant downloading it and opening it in a different program. Next came browser PDF plugins, that would take over the browser screen in order to display the PDF. A small improvement, but still not integrated and certainly not a good user experience.
So, if the goal is to add an embedded PDF viewer using HTML5 into a web app, how can that be done? There are a number of approaches, each with pros and cons. Keep reading to see what techniques exist, and which might be best for your app.
This is the simplest way to get “PDF” onto the web. Take the PDF, convert to image via CLI, and serve. PDF on the web in a format that is compatible with all browsers on all operating systems. However, there are some issues:
While converting to images may be a good solution for some applications, it is unlikely to be an optimal one. So what can we do?
The idea here is to use the browser’s native text rendering and layer it on top of an image that contains all of the non-text data. (This technique is implemented by Apryse in pdfton.PDF.Convert.ToHtml()
.) While it sounds like an incremental change from full rasterization, there are some significant advantages:
So while this is a step up from full rasterization, problems remain:
The W3C recognized the need to bring high-quality vector graphics to the web, and proposed SVG (scalable vector graphics). At first, this technology seems very promising: it will deliver the vector data and precise positioning we want, with fonts, gradients, masks and more. A “PDF killer” some predicted. Apryse took action and developed the first PDF to SVG converter in 2001. However, widespread adoption of SVG and the supplanting of PDF never came to pass. Why not? Here are a few reasons:
SVG had some built in technical limitations, but its biggest problem was (and still is) a lack of complete and correct implementations within browsers. Ultimately it has found success in certain niches, but it has not experienced widespread adoption for general use cases.
So where does that leave us? Not surprising, we are going to take a close look at “HTML5,” specifically the canvas. Does this technology finally deliver the ability to view a PDF inline? Will it succeed where others have come up short?
The HTML5 Canvas gives us 2D drawing capabilities similar to a system level library like GDI and Direct2D on Windows, and Quartz on OS X and iOS. This means that shapes, curves, text, and opacities can be represented mathematically, and rendered by the canvas at any resolution. So the big question is can we “translate” the mathematical representation of content in a PDF to a series of Javascript commands that draw them to the HTML5 Canvas. Let’s take a look.
The “holy grail” would be to use JavaScript to directly read a PDF and draw it onto an HTML5 canvas. This would offer a number of benefits:
Building such a system would seem a significant task, but it has in fact been attempted by the Mozilla Foundation in pdf.js. Pdf.js is an impressive technical achievement, but close examination leads one to conclude that it unfortunately suffers from many usability and quality issues (read our complete guide to evaluating PDF.js). This is not a reflection of pdf.js per se, but rather a technical limitation that would be inherent in any product that attempted to use Javascript/HTML5 to render a PDF. Some of the problems we encountered:
From the ‘get-go’ pdf.js faced issues on the rendering side. For example, standard HTML5 Canvas does not support paths with dashes, the even-odd fill rule, or PDF blend modes. Since Mozilla developers were in control of their own browser they were able to bandage Firefox with custom extensions (prefixed with moz-… ). Unfortunately these extensions are not part of the HTML5 standard and are not supported by all browsers, including the dominant mobile browsers. Also even with all of the custom moz extensions, ‘pdf.js’ can’t deal with some transparency groups, overprint, some soft masks, non-rgb color spaces, etc. Perhaps one day all browsers will add every extension required to accurately render a PDF, however the project clearly showed some limitations of implementing a complex graphics system in JS (read our updated guide on PDF.js rendering accuracy).
pdf.js Rendering (left) & Correct Rendering (right)
pdf.js Rendering (left) & Correct PDF rendering (right)
JavaScript is much slower than native code. Despite using GPU accelerated canvas rendering, viewing PDFs in pdf.js is slower than native viewers/plug-ins that do not use hardware acceleration. Native viewers will always be able to stay one step ahead of JavaScript viewers in terms of performance.
With browsers, the mobile PDF viewers using HTML5 do not respond well when they run out of memory: they simply exit, i.e. crash. Because PDF documents can be large and use complex resources it is not difficult to exceed the limit. (The same issues exist on the desktop, but thanks to large amounts of RAM and virtual memory, they are less critical.) For more information find our recently published PDF.js reliability benchmark where we opened 1,663 PDF files in PDF.js.
Because pdf.js uses PDF documents ‘as is,’ it is likely that the documents have not been “linearized,” that is, saved in a format that is streamable over the web. This means that the entire document must be downloaded (and stored in memory) before it can be rendered, leaving the user waiting. Although this issue is not specific to a Javascript viewer, it is a drawback to using PDF documents that have not been processed for online viewing.
What can be done to resolve these shortcomings? When you look at the source of the problems, it is that PDF documents can simply be too big and complicated to be competently handled by a pure JavaScript/HTML5 Canvas solution. So, perhaps with some pre-processing, a PDF can be normalized to a format that can be properly handled by a pure JavaScript/HTML 5 Canvas viewer. What needs to be done?
So, how well does this work? After 3+ years of implementing these optimizations for WebViewer, we are able to say that it works very well. Once the PDF has been optimized for web viewing, all of pdf.js’s shortcomings melt away, and viewing is:
These optimized documents have also served as a good basis for implementing PDF features, such as interactive forms and annotations.
Displaying a PDF in a PDF viewer using HTML5 is by no means trivial. What is clear is that for accurate and reliable viewing in a web browser, the PDF needs to be “normalized” to a web friendly representation. Some normalization methods, such as converting to images, do work, but with limitations. Sophisticated normalization, such as what is done for WebViewer, offer an experience that approaches that of a native PDF viewer.
What a difference 18 months makes. Most of the article above holds, however new technology and an innovative approach has allowed us to provide reliable and correct in-browser PDF rendering without the need to pre-process. (And no, not by using pdf.js, its problems remain.) Check out the newly released Webviewer 2.0, and our post on PDFNetJS.
Tags
tutorial
javascript
view
James Borthwick
Related Products
Web SDK
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content