NOW AVAILABLE: Summer 2025 Release
By Isaac Maw | 2025 Jul 23
5 min
Tags
Smart Data Extraction
form
pdf extraction
Summary: Streamline your document workflows by automating PDF form field detection with Apryse’s AI-powered Smart Data Extraction SDK. This feature enables developers to design application to intelligently identify and process static PDF forms — reducing manual setup and accelerating data capture.
Forms are an essential tool for collecting data from individuals, but gathering and organizing this data from a large number of forms manually is time consuming, error-prone and costly. To identify and extract data from filled forms and export it in a usable format such as JSON, the Smart Data Extraction add-on in the Apryse Server SDK is the answer.
This add-on capability lets developers automatically unlock information trapped in any PDFs and leverage content as JSON for further use and added value.
With our AI-powered Form Field Detector, part of Apryse Smart Data Extraction, developers can use form field detection for basic layout-based detection, and form field key-value extraction for semantic mapping (label-to-input.)
If you’re looking for an SDK solution for supporting fillable PDF forms, try our JavaScript PDF form filling library.
No template is required. The new form field detection accurately detects and classifies PDF form fields including radio buttons, checkboxes, signature fields, and more. Each detected field includes field type, such as text or checkbox, bounding box coordinates, and confidence score. Developers can then leverage the JSON output to generate fully interactive e-forms from any informal forms, whether it's a digitally born PDF file or a scanned copy of a form.
If you're keen to try the form field detection, visit the live demo in our showcase and check out samples and documentation. No trial key required to get started, though you need a key to use the samples.
Many other current PDF form field detectors involve a third-party application, which carries disadvantages:
By contrast, Apryse form field detection works within a standalone solution. With Apryse SDK, you can:
Form field detection uses Apryse AI technology to detect form elements in 'informal' PDF forms.
Here, an 'informal form' can be anything — a customer or patient intake form, a PDF form to order equipment, and so on. What such forms lack are true fillable PDF form functionality. It could be because files are scanned images of forms, such as documents created in Microsoft Word and saved to PDF. Or they contain non-fillable tables. As a result, users must manually fill table cells or rows using a third-party PDF editor — or by printing, filling, and scanning the document back in.
The Apryse form field detector introduces automatic PDF form field detection to change the landscape of how PDF forms are processed, making work better and life simpler.
Apryse Form Field Detection automatically surveys the layout of an informal form and then determining the most probable arrangement of the individual fields. For example, it understands the difference between a table and a form. It then classifies the type of identified fields, including:
PDF form fields identified via the Automated Detector
Whether starting with a scanned form or a non-interactive, informal one, the process for AI-enabled field detection then looks like this:
At the end, you’ll have a proper PDF e-form with fillable fields.
You can extract form fields to an external JSON file:
await PDFNet.DataExtractionModule.extractData('formfields-scanned.pdf', 'formfields-scanned.json', PDFNet.DataExtractionModule.DataExtractionEngine.e_Form);
Alternatively, using the Form Key-Value Extraction engine:
await PDFNet.DataExtractionModule.extractData('formfields-scanned.pdf', 'formfields-scanned.json', PDFNet.DataExtractionModule.DataExtractionEngine.e_FormKeyValue);
And also using an in-memory JSON string, which is useful if you want to parse the JSON right away:
const json = await PDFNet.DataExtractionModule.extractDataAsString('formfields.pdf', PDFNet.DataExtractionModule.DataExtractionEngine.e_Form);
The JSON contains a list of all the detected fields in the document. Each field is made up of a type, confidence value, and bounding box coordinates. For example:
"type": string,
"confidence": double,
"rect": [x1, y1, x2, y2]
With this list, you can:
The JSON output reflects the exact fields on the informal PDF — and the advantage of JSON is you can tweak the detected form fields as you wish. You can append content, delete form fields, or adjust the bounding rectangle coordinates in the JSON as required.
In low-stakes situations, you could decide to trust the Apryse AI’s confidence interval and skip reviewing.
Optional step: Use ElementBuilder and ElementWriter APIs. Then, review this output to verify that the fields were detected properly.
Once satisfied with the JSON output, you can build a new e-form from the contents. The new form looks something like this, with form field boundaries indicated in color.
New PDF e-form with interactive and fillable fields indicated here in red.
As part of the Smart Data Extraction add-on to the Apryse Server SDK, the form field detector runs efficiently on premises, in your application, instead of consuming costly cloud resources or requiring a third-party app. You own the entire workflow and lock down documents and data in the viewer, which improves security. We’d love to see the forms you create using the field detector and Apryse AI. If you have any issues or questions during your trial, don’t hesitate to drop us a line or leave us a note in the support forum.
When you’re ready to add Smart Data Extraction and Form Field Detection to your existing Apryse Server SDK license, contact Sales.
What is Apryse’s AI-powered PDF Form Field Detection?
Apryse’s AI-powered PDF Form Field Detection is a feature of the Smart Data Extraction add-on in the Apryse Server SDK. It automatically detects and classifies form fields in informal or scanned PDFs and outputs the data in JSON format, eliminating manual or ad-hoc form data extraction processes.
What types of form fields can Apryse detect?
Apryse can detect and classify a wide range of form fields, including:
Do I need a template to detect form fields in a PDF?
No, Apryse’s form field detection does not require a template. It uses AI to analyze the layout and semantics of a static PDF, including scanned documents and non-interactive forms.
What is the output format of the detected form fields?
The detected form fields are exported in JSON format.
Can I use Apryse for batch processing of PDFs?
Yes, Apryse supports batch extraction of PDF form data to JSON, making it ideal for high-volume environments where multiple forms need to be processed efficiently.
How does Apryse’s solution compare to third-party PDF form detectors?
Unlike many third-party tools, Apryse:
Tags
Smart Data Extraction
form
pdf extraction
Isaac Maw
Technical Content Creator
Share this post
PRODUCTS
Platform Integrations
End User Applications
Popular Content