Apryse Announces Acquisition of AI-Powered Document Toolkit Provider LEAD Technologies
By Ian Morris | 2024 Jan 30
We are thrilled to kick off a new year by announcing our latest data extraction release - pdf2Data 4.4. This version is focused on two major areas: enhancing user experience and refining data extraction capabilities. It is key to eliminating data silos and enabling users of all skill levels to make data-driven decisions.
Many businesses need to access and reuse data trapped inside PDFs, such as invoices, statements, or contracts. Recent years have seen the rise of Intelligent Document Processing (IDP) solutions, typically relying on AI or machine learning to recognize and process documents. However, such approaches require extensive training to accurately identify documents, which can be time-consuming and expensive.
In contrast, our pdf2Data solution uses a template-based approach which requires only a single example PDF to get started. Since documents such as invoices from a common supplier will have a standardized layout with only the content changing, pdf2Data allows you to collaboratively build, manage, and reuse extraction templates for specific document types.
With pdf2Data, you can easily automate the extraction of content from PDFs and transform it into reusable, structured data. Using the wide range of selectors, you can quickly build a parsing pipeline to find and extract useful data in documents. Selectors are available to intelligently identify specific text, barcodes, dates, and even multi-page tables.
Thanks to pdf2Data’s flexible and on-premises deployment, integration into existing document workflows can be seamlessly and securely achieved. With convenient Docker deployment and a RESTful API in addition to native Java and .NET libraries, pdf2Data has comprehensive cross-platform compatibility, whatever your infrastructure.
In short, pdf2Data can save your business precious time and boost the productivity of modern IDP-focused workflows by easily allowing data in PDFs to be accessed and repurposed.
Our goal has always been to make data extraction as seamless and code-free as possible so that any authorized user can use pdf2Data with little to no IT requirements. With this release, we are doubling down on that commitment.
Users can now set up even more complex extraction pipelines in the pdf2Data Editor by using new predefined rules and selectors which require minimal to no coding skills to make full use of. Templates can then be used by the pdf2Data Parsing Engine to process your documents, and enable more efficient automated IDP workflows.
Mixing and matching selectors is especially beneficial for extracting data from documents with complex structures or formatting, and as with all pdf2Data 4.x releases, we’ve focused on further improving the access to selector functionality previously only available in expert mode. Let’s take a closer look.
With pdf2Data 4.4, we’ve taken a significant leap in data extraction technology. The new release offers:
The new Search area gives greater control over where pdf2Data applies the parsing pipeline in documents. It replaces the previous the Page and Boundary selectors which have now been deprecated. You can restrict the area by specifying a page, page range, or by selecting a specific part of the document. You can then include or exclude parts of the page by clicking on them, as shown below:
You can easily include adjacent parts of the document by clicking on them.
You can also watch a tutorial video demonstrating its usage if you prefer.
The new Crop content selector allows users to define specific areas of interest in a document based on its content. It's a significant improvement for processing multi-section documents and non-static forms.
Using the Crop content selector in conjunction with the Table selector to find and recognize specific table data.
The Filter selector is a powerful addition that validates and filters extracted values. You can use specific search conditions to exclude unwanted content and ensure only relevant data is captured.
Using the Filter selector to extract a needed clause from a contract.
Together, these selectors improve the parsing process, making it more efficient and user-friendly than ever before.
If you are looking to enhance your data extraction processes with minimal coding and maximum efficiency, pdf2Data 4.4 is ready for an upgrade today. If you are new to pdf2Data or just want to learn more about how data extraction makes your work easier, faster, and more accurate, reach out today.
We have big plans for upcoming releases. Coming soon will be a great new feature where pdf2Data will be able to recognize and extract content from images – not just PDF documents. This will significantly extend pdf2Data’s capabilities and strengthen its position as an essential part of cutting-edge, efficient IDP workflows.
Download pdf2Data 4.4 today and experience improved data extraction and overall user experience. For a full list of pdf2Data improvements, please see the changelog and other documentation on our website. You can also reach out for a full demonstration if you are new to data extraction solutions.
Alternatively, if you're looking for more code-based solutions you can check out our suite of developer-focused Intelligent Data Extraction capabilities.
Share this post