PDFTron is now Apryse. Same great products, new name.

hero svg image

Intelligent Data Extraction

Unlock information stored in PDF with structured text, data, table, and article extraction into JSON output. The Apryse SDK maintains accurate structural information, allowing you to process data and reconstruct document elements exactly the way they were meant to be.

json-ify your pdfs: extract content as well as structure into text-based data

Unlock information trapped in PDFs, and leverage content as JSON data anywhere else

Reduce Overhead Cost

Accurate and dependable data extraction technology that lets you automate processes with peace of mind. Free up your users to work with JSONified data instead of spending hours customizing and fixing PDF data extracts.

Unlock Efficient workstreams

Automate and leverage the data you already own, but can't access. Text, content, and structural information becomes reusable data, available for use by any other application that can be fed with JSON data.

Easy to implement and low-maintenance

The Apryse IDP's Intelligent Data Extraction offers supreme accuracy and recognition of different document layouts, freeing developers from the burden of customizing countless document parameters and monitoring for inaccurate output.

Apryse SDK

Feature Highlights

Sanity Image
Text Extraction

Convert PDF text to JSON data, or readable Unicode text, regardless of language or font. Extract characters, words, fonts, and form fields. Populate a full-text search engine to search across a set of documents.

See Documentation

Data Extraction from Tables

Detect tables, and programmatically extract the information as JSON, XML or HTML.

Form Field Extraction

Serialize forms into JSON or into the industry-standard XFDF format to extract, edit, or insert form field data.

See Documentation

Image Extraction

Extract individual images or graphics embedded within a PDF, or convert pages into images.

Annotation Extraction

Serialize annotations into the industry-standard XFDF format (compatible with most PDF viewers). Enable users to edit annotations without modifying the underlying document, and even share annotations with other users to enable real-time collaboration.

Metadata Extraction

Analyze PDFs at a low level. Grab the PDF version, author information, timestamps, and anything else hidden away in the file.

POWERED BY APRYSE SDK
Sanity Image

Client-Side Processes

Scale easily without any server-side dependencies like Microsoft Office or LibreOffice for rendering, conversion, or editing PDFs, Microsoft Office, images, videos, and HTML.

Sanity Image

Unparalleled Rendering Quality

Bring fast rendering and leading accuracy conversion of Office documents to any web, mobile, or desktop application.

Sanity Image

Secure By Design

No outside dependencies means you can deploy on your own infrastructure without data ever leaving your platform to eliminate vulnerabilities.

Sanity Image

Expert and Reliable Support

Accelerate projects with our team of experienced SDK developers there to support you through your unlimited trial to the finish line and beyond.

Time to make work better and life simpler.