How to Convert PDF to Text in C# and Java

By Apryse | 2025 Apr 03

7 min

C# - Get Text From PDF

The following is an outline for a C# console app that will OCR an input file and print the text to the console.

using pdftron; 

using pdftron.PDF; 

 

namespace ConsoleApp1 

{ 

    internal class Program 

    { 

        static void Main(string[] args) 

        { 

            PDFNet.Initialize("your_license_key"); 

            PDFNet.AddResourceSearchPath(@"path_to_your_ocr_module"); 

 

            // Directory containing the PDF files 

            string directoryPath = "samples"; 

 

            // Get all PDF files in the directory 

            string[] pdfFiles = Directory.GetFiles(directoryPath, "*.pdf"); 

 

            foreach (string pdfFile in pdfFiles) 

            { 

                Console.WriteLine($"Processing file: {pdfFile}"); 

 

                using (PDFDoc doc = new PDFDoc(pdfFile)) 

                { 

                    doc.InitSecurityHandler(); 

 

                    // Extract text from the document 

                    string allText = ExtractTextFromDocument(doc); 

 

                    // If no text is found, perform OCR 

                    if (string.IsNullOrEmpty(allText.Trim())) 

                    { 

                        Console.WriteLine("No text found, performing OCR..."); 

                        PerformOCR(doc); 

                        allText = ExtractTextFromDocument(doc); 

                    } 

 

                    Console.WriteLine(allText); 

                } 

            } 

        } 

 

        static string ExtractTextFromDocument(PDFDoc doc) 

        { 

            TextExtractor txt = new TextExtractor(); 

            string allText = ""; 

 

            for (int i = 1; i <= doc.GetPageCount(); i++) 

            { 

                txt.Begin(doc.GetPage(i)); 

                allText += txt.GetAsText() + "\n"; 

            } 

 

            return allText; 

        } 

 

        static void PerformOCR(PDFDoc doc) 

        { 

            OCROptions ocrOptions = new OCROptions(); 

            ocrOptions.AddLang("eng"); 

            OCRModule.ProcessPDF(doc, ocrOptions); 

        } 

    } 

}

Java – Get Text From PDF

The Apryse engine is capable of storing extracted text into one of over 150 supported file formats. Here is an example of the Java implementation.

import com.pdftron.pdf.*; 

import java.io.File; 

import com.pdftron.common.PDFNetException; 

 

public class App { 

public static void main(String[] args) { 

    try { 

      PDFNet.initialize("your_license_key"); 

      PDFNet.addResourceSearchPath("path_to_your_ocr_module"); 

 

      // Directory containing the PDF files 

      String directoryPath = "src\\samples"; 

 

      // Get all PDF files in the directory 

      File dir = new File(directoryPath); 

      File[] pdfFiles = dir.listFiles((d, name) -> name.toLowerCase().endsWith(".pdf")); 

 

      if (pdfFiles != null) { 

        for (File pdfFile : pdfFiles) { 

          System.out.println("Processing file: " + pdfFile.getAbsolutePath()); 

 

          PDFDoc doc = new PDFDoc(pdfFile.getAbsolutePath()); 

          doc.initSecurityHandler(); 

 

          // Extract text from the document 

          String allText = extractTextFromDocument(doc); 

 

          // If no text is found, perform OCR 

          if (allText.trim().isEmpty()) { 

            System.out.println("No text found, performing OCR..."); 

            performOCR(doc); 

            allText = extractTextFromDocument(doc); 

          } 

 

          System.out.println(allText); 

          doc.close(); 

        } 

      } 

    } catch (PDFNetException e) { 

      e.printStackTrace(); 

    } 

  } 

 

  static String extractTextFromDocument(PDFDoc doc) throws PDFNetException { 

    TextExtractor txt = new TextExtractor(); 

    StringBuilder allText = new StringBuilder(); 

 

    for (int i = 1; i <= doc.getPageCount(); i++) { 

      txt.begin(doc.getPage(i)); 

      allText.append(txt.getAsText()).append("\n"); 

    } 

 

    return allText.toString(); 

  } 

 

  static void performOCR(PDFDoc doc) throws PDFNetException { 

    OCROptions ocrOptions = new OCROptions(); 

    ocrOptions.addLang("eng"); 

    OCRModule.processPDF(doc, ocrOptions); 

  } 

}

Apryse documentation has a step-by-step guide to converting files with the document converter in Java and C#.

Get Your Trial

Get your trial of Apryse SDK for free. It’s fully-functional, and even comes with unlimited chat and email support.

Next Steps

Stay tuned for more conversion examples to see how the Apryse OCR engine will easily fit into any workflow converting PDF files into other document files or images and back again. Need help in the meantime? Contact our sales team or reach out to us on Discord.

How to Convert PDF to Text in C# and Java

C# - Get Text From PDF

Java – Get Text From PDF

Get Your Trial

Next Steps

Resources

Related Articles

View all blogs

How to Solve Six Common Problems when Getting Started with Apryse WebViewer

WebViewer, WebViewer Server, and PDFViewCtrl – What’s the Difference?

Why Embed Client-Side Spreadsheet Support in Your App?