RELEASE: What's New in Summer 2024

Streaming a PDF From the Web to a Mobile or Desktop App

By James Borthwick | 2019 Feb 20

Sanity Image
Read time

5 min

Users want to work quickly wherever they are, even when working remotely. Yet viewing PDFs remotely is not always pleasant. To stream PDFs to a browser, the PDFs must often first load fully before they can open, leading to frustrating delays -- not to mention higher data plan costs. And for customers dealing constantly with large documents like construction drawing sets, either of these issues could be deal breakers.

Wouldn’t it be nice instead to view remote documents the same way one enjoys online video? That is to say, with partial content streamed in and opened in mere seconds regardless of file size. Well, PDF streaming is in fact possible on virtually any device using linearized PDFs and Apryse SDK’s Open URL method -- including an option in iOS, Android, Xamarin, and UWP to restrict downloads to only those pages displayed in the viewer.

PDF Streaming – Beyond 10 MB, a Clear Winner

Copied to clipboard

Ideally, with document streaming, remote documents should open within a few seconds at most. And when a user scrolls to the middle of the content, that should be prioritized and loaded very quickly. Linearization restructures PDF content so it can be streamed into a PDF viewer in bite-sized chunks. The difference this makes when opening documents above 10 megabytes is striking.

In fact, in this PDF stream example, we filmed 36 PDFs loading on an Android mobile device via a 4G mobile network to compare open times between non-linearized content and streamed linearized content. As you can see from viewing all 18 comparisons here, the larger the file size, the bigger the difference streaming makes. Linearized documents opened in about 6.5 seconds on average across all file sizes; whereas non-linearized files took longer to open the larger the file, with the largest files taking minutes to open.

PDF download speeds Linearized PDF vs non-linearized PDF on Android

A comparison of PDFs loaded on an Android device via a 4G network

Re-structuring PDFs for Document Streaming

Copied to clipboard

To understand how linearization can make such a huge difference when opening files larger than 10 MB, you first need to know a little about PDF document structure and the linearization process. Unfortunately, many PDF documents generated today are not linearized; that means they are structured with resources for page one, for example, often scattered across the file. To open and display such a file, a viewer therefore needs to download the entire document, adding extra seconds if not minutes of delay.

Linearizing a PDF file corrects these issues by reordering document content and adding resources to the start of the file that let programs quickly understand what’s inside. As the term linearization suggests, linearization reorders pages by page number from beginning to end while grouping page data together in the same place. A “linearization dictionary” and “hint tables” are also added to the document’s beginning. These list the location and size of all internal objects. A PDF viewer designed to handle partial content then need only parse this initial information to identify the data it must request first to display the first page, with surrounding pages filling out as they load in order of priority, thus allowing for a more responsive and seamless user experience.

How to Stream PDFs via Apryse SDK

Copied to clipboard

When streaming PDF files into your client, whether on desktop, mobile or a browser, you need the following:

  1. A linearized PDF document
  2. A web server that supports byte-range requests (otherwise known as byte serving)
  3. A PDF SDK that supports the Open URL method

1. Linearizing your PDFs with Apryse

Apryse SDK can linearize documents when they are saved. Once you have a PDFDoc object (by either opening an existing PDF or creating a new one) you save it with the e_linearized flag. For example:

using (PDFDoc doc = new PDFDoc("in.pdf")) {
pdfdoc.Save("out.pdf", SDFDoc.SaveOptions.e_linearized);
}

2. Byte-Range Requests Explained

The second requirement is a web server that supports byte-range requests. A byte-range request asks the server to send a certain set of bytes from a file. This range doesn’t necessarily start from byte zero or comprise the entire file. For example, if the HTTP GET headers including the following key-value pair:

Range: bytes=1495454-1594723

then 97 KB will be sent from the requested file by a byte-range supporting server, starting at byte 1495454.

The good news is your web server probably already supports this feature. To test if it does, use cURL on your favourite *nix system (or using the native Windows version) as follows:

curl -H Range:bytes=16- -I http://pdftron.com/index.html

If the server responds with “HTTP/1.1 206 Partial Content” then it supports byte ranges. (If it responds with “HTTP/1.1 200 OK”, then it does not support byte ranges.)

A couple of small notes:

  1. If you’re storing documents on SharePoint, caching needs to be enabled to allow byte-range requests.
  2. If you are serving documents stored in a database served via dynamic web pages, it may not support byte-serving in this specific case. A solution would be to temporarily save the file to a static URL that can then be used for byte serving.

3. Opening a Remote Linearized PDF Document

Now that you understand how to linearize a document, and how to check to make sure your web server supports byte serving, the third and last step is to actually open the document.

This is done with PDFViewCtrl’s API OpenURLAsnyc. This API is available for Windows (C++, .NET), Android, iOS, WinRT/Windows Phone.

Instead of calling SetDoc:

PDFViewCtrl.SetDoc(PDFDoc doc);

The call is replaced with a call to OpenURLAsync:

PDFViewCtrl.OpenURLAsync(string url);

What Happens After the Call to OpenURLAsync

Copied to clipboard

Once the call to Open URLAsync is made, there will be a slight pause while the control contacts the server and downloads the preliminary data describing the document, such as the total number of pages and where resources for each page are kept.

Once this information is obtained (typically within 0.5 to 3 seconds) blank pages for the entire document load, and content for the current page downloads, converting the PDF stream to text. If the user does not scroll the document, the control will continue downloading content for the surrounding pages. If the user scrolls to a non-downloaded page, the control will then load that content first before other pages that still need downloading. This ensures a responsive viewing experience that (depending on network connection speed) will not differ tremendously from viewing a local PDF.

Restricting Data Usage in Android, iOS, Xamarin, and UWP

Copied to clipboard

By default, the streamed PDF file will download continuously until the entire file finishes loading. However, with Apryse SDK in Android, iOS, Xamarin, and UWP, it is possible to configure the solution to download only those pages the user actually views to keep data usage to a minimum.

To restrict your viewer to downloading only those parts of a document that are currently on screen, use the following:

Android

PDFViewCtrl.HTTPRequestOptions httpRequestOptions = new PDFViewCtrl.HTTPRequestOptions();
httpRequestOptions.restrictDownloadUsage(true);
mPdfViewCtrl.openUrlAsync("http://example.com/sample.pdf", "cache_path", "password", httpRequestOptions);

iOS

let options = PTHTTPRequestOptions()
options?.restrictDownloadUsage(true)
pdfViewCtrl.openUrlAsync("http://example.com/sample.pdf", withPDFPassword: nil, withCacheFile: nil, with: options)

Xamarin

var httpRequestOptions = new PDFViewCtrl.HTTPRequestOptions();
httpRequestOptions.RestrictDownloadUsage(true);
mPdfViewCtrl.OpenUrlAsync("http://example.com/sample.pdf", "cache_path", "password", httpRequestOptions);

UWP

HTTPRequestOptions httpRequestOptions = new HTTPRequestOptions();
httpRequestOptions.RestrictDownloadUsage(true);
await _PDFViewCtrl.OpenURLAsync(url);

Conclusion

Copied to clipboard

As more users expect seamless and responsive access to cloud data, it is important that information stored remotely can be accessed in an efficient manner. Using linearized PDF documents with Apryse SDK’s OpenURL method is a way to ensure a top-notch remote PDF viewing experience and reduced network service costs.

If you have any questions about Apryse's PDF SDK, please feel free to get in touch!

Sanity Image

James Borthwick

Share this post

email
linkedIn
twitter