AVAILABLE NOW: Spring 2025 Release
By Apryse | 2025 Jun 23
6 min
Tags
image
document parsing
This blog shows how to create an app that will search for a specified string in all the files in a folder and then returns any files that match the search criteria.
Let’s jump right in.
First, we need to set the license and libraries using the following code. We’ll also specify the folder to search and the string to search for. And last in this piece of code, is the call to the RunSearch function.
static void Main(string[] args)
{
PDFNet.Initialize(PDFTronLicense.Key);
// Path for the OCR Module or any other additional Apryse libraries
PDFNet.AddResourceSearchPath(@"C:\PDFNetC64\Lib");
string dir = @"PATH TO DIRECTORY TO BE SEARCHED";
string pattern = "WORD TO SEARCH FOR";
RunSearch(dir, pattern);
}
Note: Before using the app, you will need to set your license appropriately in the LicenseKey.cs file.
Now that we’ve set up the code to get started, we now need to at the RunSearch function. The following code will take the folder and search string parameters we set previously and search all files in the folder. If the file is:
Note: You can see the file types that are supported, and if there is a file type that isn't supported it will let you know.
static void RunSearch(string directory, string pattern)
{
TextSearchObject = new TextSearch();
foreach (var file in Directory.GetFiles(directory, "*.*",
SearchOption.AllDirectories))
{
try
{
Console.WriteLine($"Now processing {Path.GetFileName(file)}");
switch (Path.GetExtension(file).ToLowerInvariant())
{
case ".pdf":
PDFDoc doc = new PDFDoc(file);
SearchFile(doc, pattern, Path.GetFileName(file));
break;
case ".docx":
case ".doc":
PDFDoc officeDoc = new PDFDoc();
Convert.OfficeToPDF(officeDoc, file, null);
SearchFile(officeDoc, pattern, Path.GetFileName(file));
break;
case ".jpg":
case ".jpeg":
case ".png":
case ".bmp":
case ".gif":
case ".tif":
case ".tiff":
PDFDoc imageDoc = new PDFDoc();
OCRModule.ImageToPDF(imageDoc, file, null);
SearchFile(imageDoc, pattern, Path.GetFileName(file));
break;
default:
Console.WriteLine($"File type not supported for
{Path.GetFileName(file)}");
break;
}
Console.WriteLine();
}
catch (Exception e)
{
Console.WriteLine("Error processing file check debug output for
more information\n");
Debug.WriteLine(e.ToString());
}
}
}
The last function we need to add is the following code that searches the document passed to it with the specified term to search for. It uses the TextSearch object that was created in the RunSearch function.
static void SearchFile(PDFDoc doc, string pattern, string fileName)
{
try
{
int page = 0;
string result_str = "", ambient_str = "";
Int32 mode = (Int32)(TextSearch.SearchMode.e_whole_word);
Highlights hlts = new Highlights();
TextSearchObject.Begin(doc, pattern, mode, -1, -1);
while (true)
{
TextSearch.ResultCode code = TextSearchObject.Run(ref page, ref
result_str, ref ambient_str, hlts);
if (code == TextSearch.ResultCode.e_found)
{
Console.WriteLine($"Found {result_str} on page {page} in file
{fileName}");
}
else
{
break;
}
}
}
catch (Exception e) { Console.WriteLine(e.Message); }
}
And there you have it. We’ve just created an app to search for text within a directory full of images or documents. As you can see, the process is pretty straightforward.
If you have any questions, feel free to contact sales or reach out to us on discord!
Tags
image
document parsing
Apryse
Share this post
PRODUCTS
Platform Integrations
End User Applications
Popular Content