How to Automate PDF Form Field Detection with Apryse Smart Data Extraction

By Isaac Maw | 2025 Jul 23

5 min

How to Get Data from Form Fields and Export to JSON

Copied to clipboard

Many other current PDF form field detectors involve a third-party application, which carries disadvantages:

Additional licensing costs
Having to download and then re-upload a PDF, which introduces security concerns, breaks the workflow, and adds extra steps to the workflow
Detection and form building occurs one PDF at a time, which rules out batch processes

By contrast, Apryse form field detection works within a standalone solution. With Apryse SDK, you can:

Automate form field detection right from within your application
Combine with Apryse form building APIs to create a complete, end-to-end, and customizable e-form creator
Use for batch conversion to JSON to create e-forms in high-volume environments

How Does Apryse PDF Form Field Detection Work?

Copied to clipboard

Form field detection uses Apryse AI technology to detect form elements in 'informal' PDF forms.

Here, an 'informal form' can be anything — a customer or patient intake form, a PDF form to order equipment, and so on. What such forms lack are true fillable PDF form functionality. It could be because files are scanned images of forms, such as documents created in Microsoft Word and saved to PDF. Or they contain non-fillable tables. As a result, users must manually fill table cells or rows using a third-party PDF editor — or by printing, filling, and scanning the document back in.

The Apryse form field detector introduces automatic PDF form field detection to change the landscape of how PDF forms are processed, making work better and life simpler.

Apryse Form Field Detection automatically surveys the layout of an informal form and then determining the most probable arrangement of the individual fields. For example, it understands the difference between a table and a form. It then classifies the type of identified fields, including:

Text fields
Radio buttons
Checkboxes
Combo boxes
Buttons
Digital signature fields

PDF form fields identified via the Automated Detector

How to Use the PDF Field Detection and Form Creation Process

Copied to clipboard

Whether starting with a scanned form or a non-interactive, informal one, the process for AI-enabled field detection then looks like this:

Use the Apryse Smart Data Extraction Form Field Detection API to detect all fields and produce a JSON that contains all the fields.
Optionally, produce a copy document. This PDF copy has bounding boxes drawn around fields based on the JSON and using ElementBuilder and ElementWriter APIs. Then, review the output to verify that fields were detected properly.
Insert fields based on the output JSON coordinates of the detected field types.
Save the modified PDF back into your database.

At the end, you’ll have a proper PDF e-form with fillable fields.

Using and Reviewing Form Field Output in JSON Format

You can extract form fields to an external JSON file:

await PDFNet.DataExtractionModule.extractData('formfields-scanned.pdf', 'formfields-scanned.json', PDFNet.DataExtractionModule.DataExtractionEngine.e_Form);

Alternatively, using the Form Key-Value Extraction engine:

await PDFNet.DataExtractionModule.extractData('formfields-scanned.pdf', 'formfields-scanned.json', PDFNet.DataExtractionModule.DataExtractionEngine.e_FormKeyValue);

And also using an in-memory JSON string, which is useful if you want to parse the JSON right away:

const json = await PDFNet.DataExtractionModule.extractDataAsString('formfields.pdf', PDFNet.DataExtractionModule.DataExtractionEngine.e_Form);

The JSON contains a list of all the detected fields in the document. Each field is made up of a type, confidence value, and bounding box coordinates. For example:

"type": string, 
"confidence": double, 
"rect": [x1, y1, x2, y2]

With this list, you can:

Highlight where the field was detected to review that the field is indeed correct.
Programmatically insert the field based on the detected output.

The JSON output reflects the exact fields on the informal PDF — and the advantage of JSON is you can tweak the detected form fields as you wish. You can append content, delete form fields, or adjust the bounding rectangle coordinates in the JSON as required.

In low-stakes situations, you could decide to trust the Apryse AI’s confidence interval and skip reviewing.

Produce a Copy PDF with Bounding Boxes

Optional step: Use ElementBuilder and ElementWriter APIs. Then, review this output to verify that the fields were detected properly.

Building a New E-form from Form Fields

Once satisfied with the JSON output, you can build a new e-form from the contents. The new form looks something like this, with form field boundaries indicated in color.

New PDF e-form with interactive and fillable fields indicated here in red.

What’s Next with Apryse Form Field Detection?

Copied to clipboard

As part of the Smart Data Extraction add-on to the Apryse Server SDK, the form field detector runs efficiently on premises, in your application, instead of consuming costly cloud resources or requiring a third-party app. You own the entire workflow and lock down documents and data in the viewer, which improves security. We’d love to see the forms you create using the field detector and Apryse AI. If you have any issues or questions during your trial, don’t hesitate to drop us a line or leave us a note in the support forum.

When you’re ready to add Smart Data Extraction and Form Field Detection to your existing Apryse Server SDK license, contact Sales.

FAQ

What is Apryse’s AI-powered PDF Form Field Detection?

Apryse’s AI-powered PDF Form Field Detection is a feature of the Smart Data Extraction add-on in the Apryse Server SDK. It automatically detects and classifies form fields in informal or scanned PDFs and outputs the data in JSON format, eliminating manual or ad-hoc form data extraction processes.

What types of form fields can Apryse detect?

Apryse can detect and classify a wide range of form fields, including:

Text fields
Checkboxes
Radio buttons
Buttons
Digital signature fields

Do I need a template to detect form fields in a PDF?

No, Apryse’s form field detection does not require a template. It uses AI to analyze the layout and semantics of a static PDF, including scanned documents and non-interactive forms.

What is the output format of the detected form fields?

The detected form fields are exported in JSON format.

Can I use Apryse for batch processing of PDFs?

Yes, Apryse supports batch extraction of PDF form data to JSON, making it ideal for high-volume environments where multiple forms need to be processed efficiently.

How does Apryse’s solution compare to third-party PDF form detectors?

Unlike many third-party tools, Apryse: