RELEASE: What's New in Summer 2024
By James Borthwick | 2019 Feb 20
5 min
Tags
linearization
view
tutorial
Users want to work quickly wherever they are, even when working remotely. Yet viewing PDFs remotely is not always pleasant. To stream PDFs to a browser, the PDFs must often first load fully before they can open, leading to frustrating delays -- not to mention higher data plan costs. And for customers dealing constantly with large documents like construction drawing sets, either of these issues could be deal breakers.
Wouldn’t it be nice instead to view remote documents the same way one enjoys online video? That is to say, with partial content streamed in and opened in mere seconds regardless of file size. Well, PDF streaming is in fact possible on virtually any device using linearized PDFs and Apryse SDK’s Open URL method -- including an option in iOS, Android, Xamarin, and UWP to restrict downloads to only those pages displayed in the viewer.
Ideally, with document streaming, remote documents should open within a few seconds at most. And when a user scrolls to the middle of the content, that should be prioritized and loaded very quickly. Linearization restructures PDF content so it can be streamed into a PDF viewer in bite-sized chunks. The difference this makes when opening documents above 10 megabytes is striking.
In fact, in this PDF stream example, we filmed 36 PDFs loading on an Android mobile device via a 4G mobile network to compare open times between non-linearized content and streamed linearized content. As you can see from viewing all 18 comparisons here, the larger the file size, the bigger the difference streaming makes. Linearized documents opened in about 6.5 seconds on average across all file sizes; whereas non-linearized files took longer to open the larger the file, with the largest files taking minutes to open.
A comparison of PDFs loaded on an Android device via a 4G network
To understand how linearization can make such a huge difference when opening files larger than 10 MB, you first need to know a little about PDF document structure and the linearization process. Unfortunately, many PDF documents generated today are not linearized; that means they are structured with resources for page one, for example, often scattered across the file. To open and display such a file, a viewer therefore needs to download the entire document, adding extra seconds if not minutes of delay.
Linearizing a PDF file corrects these issues by reordering document content and adding resources to the start of the file that let programs quickly understand what’s inside. As the term linearization suggests, linearization reorders pages by page number from beginning to end while grouping page data together in the same place. A “linearization dictionary” and “hint tables” are also added to the document’s beginning. These list the location and size of all internal objects. A PDF viewer designed to handle partial content then need only parse this initial information to identify the data it must request first to display the first page, with surrounding pages filling out as they load in order of priority, thus allowing for a more responsive and seamless user experience.
When streaming PDF files into your client, whether on desktop, mobile or a browser, you need the following:
Apryse SDK can linearize documents when they are saved. Once you have a PDFDoc object (by either opening an existing PDF or creating a new one) you save it with the e_linearized flag. For example:
using (PDFDoc doc = new PDFDoc("in.pdf")) {
pdfdoc.Save("out.pdf", SDFDoc.SaveOptions.e_linearized);
}
The second requirement is a web server that supports byte-range requests. A byte-range request asks the server to send a certain set of bytes from a file. This range doesn’t necessarily start from byte zero or comprise the entire file. For example, if the HTTP GET headers including the following key-value pair:
Range: bytes=1495454-1594723
then 97 KB will be sent from the requested file by a byte-range supporting server, starting at byte 1495454.
The good news is your web server probably already supports this feature. To test if it does, use cURL on your favourite *nix system (or using the native Windows version) as follows:
curl -H Range:bytes=16- -I http://pdftron.com/index.html
If the server responds with “HTTP/1.1 206 Partial Content” then it supports byte ranges. (If it responds with “HTTP/1.1 200 OK”, then it does not support byte ranges.)
A couple of small notes:
Now that you understand how to linearize a document, and how to check to make sure your web server supports byte serving, the third and last step is to actually open the document.
This is done with PDFViewCtrl’s API OpenURLAsnyc. This API is available for Windows (C++, .NET), Android, iOS, WinRT/Windows Phone.
Instead of calling SetDoc:
PDFViewCtrl.SetDoc(PDFDoc doc);
The call is replaced with a call to OpenURLAsync:
PDFViewCtrl.OpenURLAsync(string url);
Once the call to Open URLAsync is made, there will be a slight pause while the control contacts the server and downloads the preliminary data describing the document, such as the total number of pages and where resources for each page are kept.
Once this information is obtained (typically within 0.5 to 3 seconds) blank pages for the entire document load, and content for the current page downloads, converting the PDF stream to text. If the user does not scroll the document, the control will continue downloading content for the surrounding pages. If the user scrolls to a non-downloaded page, the control will then load that content first before other pages that still need downloading. This ensures a responsive viewing experience that (depending on network connection speed) will not differ tremendously from viewing a local PDF.
By default, the streamed PDF file will download continuously until the entire file finishes loading. However, with Apryse SDK in Android, iOS, Xamarin, and UWP, it is possible to configure the solution to download only those pages the user actually views to keep data usage to a minimum.
To restrict your viewer to downloading only those parts of a document that are currently on screen, use the following:
Android
PDFViewCtrl.HTTPRequestOptions httpRequestOptions = new PDFViewCtrl.HTTPRequestOptions();
httpRequestOptions.restrictDownloadUsage(true);
mPdfViewCtrl.openUrlAsync("http://example.com/sample.pdf", "cache_path", "password", httpRequestOptions);
iOS
let options = PTHTTPRequestOptions()
options?.restrictDownloadUsage(true)
pdfViewCtrl.openUrlAsync("http://example.com/sample.pdf", withPDFPassword: nil, withCacheFile: nil, with: options)
Xamarin
var httpRequestOptions = new PDFViewCtrl.HTTPRequestOptions();
httpRequestOptions.RestrictDownloadUsage(true);
mPdfViewCtrl.OpenUrlAsync("http://example.com/sample.pdf", "cache_path", "password", httpRequestOptions);
UWP
HTTPRequestOptions httpRequestOptions = new HTTPRequestOptions();
httpRequestOptions.RestrictDownloadUsage(true);
await _PDFViewCtrl.OpenURLAsync(url);
As more users expect seamless and responsive access to cloud data, it is important that information stored remotely can be accessed in an efficient manner. Using linearized PDF documents with Apryse SDK’s OpenURL method is a way to ensure a top-notch remote PDF viewing experience and reduced network service costs.
If you have any questions about Apryse's PDF SDK, please feel free to get in touch!
Tags
linearization
view
tutorial
James Borthwick
Related Products
Share this post
PRODUCTS
Enterprise
Small Business
Popular Content