NOW AVAILABLE: Summer 2025 Release
By Apryse | 2025 Jun 25
5 min
Tags
ocr
C#
In this blog, we'll take a look at how searchable PDFs are created using OCR, and identifying files in bulk for this process.
Receiving a mix of both raster and searchable PDFs can be a pain, especially for those who get these files from clients daily. The Apryse OCR and PDF SDK makes it easy for developers to check if a file can be converted to searchable PDF using OCR.
Within PDFs are a few different object types, including:
This blog post will showcase how to check all object types from stored PDFs, then calculate if the PDF needs to converted to a searchable PDF using the OCR SDK. Here's how it works:
using pdftron;
using pdftron.PDF;
using pdftron.SDF;
namespace ConsoleApp1
{
internal class Program
{
static void Main(string[] args)
{
PDFNet.Initialize(PDFTronLicense.License);
PDFNet.AddResourceSearchPath(PDFTronLicense.ModulePath);
string inputDir = @"path_to_input_dir";
string outputDir = @"path_to_output_dir";
foreach (string filePath in Directory.GetFiles(inputDir, "*.pdf"))
{
using (PDFDoc doc = new PDFDoc(filePath))
{
doc.InitSecurityHandler();
int textCount = 0;
int nonTextCount = 0;
ElementReader reader = new ElementReader();
for (int i = 1; i <= doc.GetPageCount(); i++)
{
pdftron.PDF.Page page = doc.GetPage(i);
reader.Begin(page);
Element element;
while ((element = reader.Next()) != null)
{
switch (element.GetType())
{
case Element.Type.e_text:
textCount++;
break;
default:
nonTextCount++;
break;
}
}
reader.End();
}
int totalElements = textCount + nonTextCount;
double nonTextPercentage = (double)nonTextCount / totalElements * 100;
if (nonTextPercentage > 10)
{
OCRModule.ProcessPDF(doc, null);
string outputFilePath = Path.Combine(outputDir, Path.GetFileName(filePath));
doc.Save(outputFilePath, SDFDoc.SaveOptions.e_linearized);
Console.WriteLine($"OCR performed on {filePath} and saved to {outputFilePath}");
}
else
{
Console.WriteLine($"No OCR needed for {filePath}");
}
}
}
}
}
}
Now you have a starting point to check if OCR is needed for a folder of PDF files. Simple as that!
To learn more about Apryse OCR, visit our documentation. If you have any questions or are ready to get started, contact sales or check out the Server SDK trial.
Tags
ocr
C#
Apryse
Share this post
PRODUCTS
Platform Integrations
End User Applications
Popular Content