AVAILABLE NOW: Spring 2025 Release

Directory Word Search for Images and Documents

By Apryse | 2025 Jun 23

Sanity Image
Read time

6 min

This blog shows how to create an app that will search for a specified string in all the files in a folder and then returns any files that match the search criteria.

Let’s jump right in.

Step 1: Set up

Copied to clipboard

First, we need to set the license and libraries using the following code. We’ll also specify the folder to search and the string to search for. And last in this piece of code, is the call to the RunSearch function.

C# code

static void Main(string[] args) 

{ 

  PDFNet.Initialize(PDFTronLicense.Key); 

  // Path for the OCR Module or any other additional Apryse libraries 

  PDFNet.AddResourceSearchPath(@"C:\PDFNetC64\Lib"); 

  

  string dir = @"PATH TO DIRECTORY TO BE SEARCHED"; 

  string pattern = "WORD TO SEARCH FOR"; 

  RunSearch(dir, pattern); 

} 

Note: Before using the app, you will need to set your license appropriately in the LicenseKey.cs file.

Step 2: RunSearch

Copied to clipboard

Now that we’ve set up the code to get started, we now need to at the RunSearch function. The following code will take the folder and search string parameters we set previously and search all files in the folder. If the file is:

  • A PDF, the code just runs the SearchFile function on it.
  • A DOCX/DOC file, the code converts to PDF and then runs the search.
  • An image, the code uses OCR to process the image and then runs the search.

Note: You can see the file types that are supported, and if there is a file type that isn't supported it will let you know.

static void RunSearch(string directory, string pattern) 

{ 

TextSearchObject = new TextSearch(); 

foreach (var file in Directory.GetFiles(directory, "*.*", 

SearchOption.AllDirectories)) 

{ 

   try 

   { 

  Console.WriteLine($"Now processing {Path.GetFileName(file)}"); 

  switch (Path.GetExtension(file).ToLowerInvariant()) 

  	  { 

case ".pdf": 

  PDFDoc doc = new PDFDoc(file); 

  SearchFile(doc, pattern, Path.GetFileName(file)); 

  break; 

case ".docx": 

case ".doc": 

  PDFDoc officeDoc = new PDFDoc(); 

  Convert.OfficeToPDF(officeDoc, file, null); 

  SearchFile(officeDoc, pattern, Path.GetFileName(file)); 

  break; 

case ".jpg": 

case ".jpeg": 

case ".png": 

case ".bmp": 

case ".gif": 

case ".tif": 

case ".tiff": 

  PDFDoc imageDoc = new PDFDoc(); 

  OCRModule.ImageToPDF(imageDoc, file, null); 

  SearchFile(imageDoc, pattern, Path.GetFileName(file)); 

  break; 

default: 

  Console.WriteLine($"File type not supported for 

{Path.GetFileName(file)}"); 

  break; 

} 

Console.WriteLine(); 

} 

catch (Exception e) 

{ 

Console.WriteLine("Error processing file check debug output for 

more information\n"); 

Debug.WriteLine(e.ToString()); 

} 

  } 

} 

Step 3: SearchFile

Copied to clipboard

The last function we need to add is the following code that searches the document passed to it with the specified term to search for. It uses the TextSearch object that was created in the RunSearch function.

static void SearchFile(PDFDoc doc, string pattern, string fileName) 

{ 

  try 

  { 

int page = 0; 

string result_str = "", ambient_str = ""; 

Int32 mode = (Int32)(TextSearch.SearchMode.e_whole_word); 

Highlights hlts = new Highlights(); 

  

TextSearchObject.Begin(doc, pattern, mode, -1, -1); 

while (true) 

{ 

  TextSearch.ResultCode code = TextSearchObject.Run(ref page, ref 

result_str, ref ambient_str, hlts); 

  

  if (code == TextSearch.ResultCode.e_found) 

  { 

Console.WriteLine($"Found {result_str} on page {page} in file 

{fileName}"); 

  } 

  else 

  { 

  break; 

  } 

} 

  } 

  catch (Exception e) { Console.WriteLine(e.Message); } 

} 

And there you have it. We’ve just created an app to search for text within a directory full of images or documents. As you can see, the process is pretty straightforward.

If you have any questions, feel free to contact sales or reach out to us on discord!

Sanity Image

Apryse

Share this post

email
linkedIn
twitter