Unlock the Power of Direct PDF Editing with WebViewer 10.7

Automating PDF Form Field Detection with Apryse IDP

By John Chow | 2023 Feb 28

Sanity Image
Read time

5 min

Add Apryse’s AI-powered PDF Form Field Detection into your workflow and application, then leverage the JSON output to auto-generate fully interactive PDF e-forms.

Part 1 and Part 2 of our winter release series introduced the new Apryse intelligent data extraction capability, part of the IDP add-on in the Apryse Server SDK. This lets organizations automatically unlock information trapped in any PDFs and leverage content as JSON.

This blog, Part 3, introduces you to the AI-powered Form Field Detector, part of the new Apryse IDP data extraction capability. The new form field detection accurately detects and classifies PDF form fields — including radio buttons, checkboxes, signature fields, and more. Developers can then leverage the JSON output to generate fully interactive e-forms from any informal forms, whether the file is a digitally born PDF file or a scanned copy of a form.

If you're keen to try the form field detection, visit the live demo and check out samples and documentation. No trial key required to get started, though you need a key to use the samples.

Breaking New Ground in PDF Form Field Detection

Copied to clipboard

Current PDF form field detectors involve a third-party application, which carries disadvantages:

  • Additional licensing costs.
  • Having to download and then re-upload a PDF, which introduces security concerns, breaks the workflow, and adds extra steps to the workflow.
  • Detection and form building occurs one PDF at a time, which rules out batch processes.

By contrast, Apryse form field detection works within a standalone solution. With it, you can...

  • Automate form field detection right from within your application.
  • Combine with Apryse form building APIs to create a complete, end-to-end, and customizable e-form creator.
  • Use for batch conversion to JSON to create e-forms in high-volume environments.

How Does Apryse PDF Form Field Detection Work?

Copied to clipboard

Form field detection uses Apryse AI technology to detect form elements in "informal" PDF forms.

Here, an "informal form” can be anything — a customer or patient intake form, a PDF form to order equipment, and so on. What such forms lack are true fillable PDF form functionality. It could be because files are scanned images of forms. They were created in Microsoft Word and saved to PDF. Or they contain non-fillable tables. As a result, users must manually fill table cells or rows using a third-party PDF editor — or by printing, filling, and scanning the document back in.

The Apryse form field detector introduces automatic PDF form field detection to change the landscape of how PDF forms are processed, making work better and life simpler.

It does so by automatically surveying the layout of an informal form and then determining the most probable arrangement of the individual fields. For example, it understands the difference between a table and a form. It then classifies the type of identified fields, including:

  • Text fields
  • Radio buttons
  • Checkboxes
  • Combo boxes
  • Buttons
  • Digital signature fields
Image of a PDF form

PDF form fields identified via the Automated Detector

The PDF Field Detection and Form Creation Process

Copied to clipboard

Whether starting with a scanned form or a non-interactive, informal one, the process for AI-enabled field detection then looks like this:

  1. Use the Apryse IDP Form Field Detection API to detect all fields and produce a JSON that contains all the fields.
  2. Optionally, produce a copy document. This PDF copy has bounding boxes drawn around fields based on the JSON and using ElementBuilder and ElementWriter APIs. Then, review the output to verify that fields were detected properly.
  3. Insert fields based on the output JSON coordinates of the detected field types.
  4. Save the modified PDF back into your database.

At the end, you’ll have a proper PDF e-form with fillable fields. Let’s take a closer look:

1. Using and Reviewing Form Field Output in JSON Format

You can extract form fields to an:

  • External JSON file, or
  • In-memory JSON string, which is useful if you want to parse the JSON right away.

The JSON contains a list of all the detected fields in the document. Each field is made up of a type, confidence value, and bounding box coordinates. For example:

"type": string, 
  "confidence": double, 
  "rect": [x1, y1, x2, y2] 
} 

With this list, you can:

  • Highlight where the field was detected to review that the field is indeed correct.
  • Programmatically insert the field based on the detected output.

The JSON output reflects the exact fields on the informal PDF — and the advantage of JSON is you can tweak the detected form fields as you wish. You can append content, delete form fields, or adjust the bounding rectangle coordinates in the JSON as required.

In low-stakes situations, you could decide to trust the Apryse AI’s confidence interval and skip reviewing.

2. Produce a Copy PDF with Bounding Boxes

Optional step: Use ElementBuilder and ElementWriter APIs. Then, review this output to verify that the fields were detected properly.

3. Building a New E-form from Form Fields

Once satisfied with the JSON output, you can build a new e-form from the contents. The new form looks something like this, with form field boundaries indicated in color:

Image of a PDF with interactive fields

New PDF e-form with interactive and fillable fields indicated here in red.

What’s Next with Apryse Form Field Detection?

Copied to clipboard

As part of the IDP add-on to the Apryse Server SDK, the new form field detector runs efficiently on premises, in your application, instead of consuming costly cloud resources or requiring a third-party app. You own the entire workflow and lock down documents and data in the viewer, which improves security.

We’d love to see the forms you create using the new field detector and Apryse AI. If you have any issues or questions during your free trial, don’t hesitate to drop us a line or leave us a note in the support forum.

When you’re ready to add IDP and Form Field Detection to your existing Apryse Server SDK license, contact Sales.

Sanity Image

John Chow

Product Manager

Share this post

email
linkedIn
twitter