How to Build an Android Document Scanner App with OCR

By Branden Fung | 2020 Oct 23

3 min

Client Setup for Android Document Scanner with Apryse SDK

Copied to clipboard

Create a new Android project using Android Studio.
Add Google's ML Kit Text Recognition Android libraries as described in the ML Kit guide.
Download the following AAR file and add the AAR as a new module dependency in your project.
Integrate the Apryse library via Gradle, as described here.

    // Add callback to handle returned image from scanner
    val scannerLauncher = registerForActivityResult(ScannerContract()) { uri ->
        if (uri != null) {
            // Obtain the bitmap and save as a local image file
            var bitmap: Bitmap? = null
            bitmap = MediaStore.Images.Media.getBitmap(contentResolver, uri)
            contentResolver.delete(uri!!, null, null)

            // Save bitmap to local cache as image then upload for processing
            val localJpeg = Utils.saveBitmapAsJpeg(bitmap)

            // Process image using ML Kit
            processOCR(imgWidth, imgHeight, image, localJpeg)
        }
    }

    ...

    // Launch the scanner activity
    scannerLauncher.launch(ScanConstants.OPEN_CAMERA)

5. Next, as mentioned previously, the Android app will use our fork of a third-party Android document scanner library, found here. We'll use this library to capture, crop, and filter images using the built-in camera.

You can launch the scanner and handle the returned image by calling the following in your MainActivity. (Note: The processOCR method will be implemented later in the guide.)

    private fun processOCR(
        imgWidth: Double,
        imgHeight: Double,
        image: InputImage,
        localJpeg: File
    ) {
        val result = TextRecognition.getClient().process(image)
            .addOnSuccessListener { visionText ->

                // Create the PDF containing the recognized text
                val outputPath = createPDF(imgWidth, imgHeight, localJpeg, visionText)

                // Open the document in the viewer
                val config =
                    ViewerConfig.Builder().openUrlCachePath(cacheDir.absolutePath).build()
                DocumentActivity.openDocument(
                    this@MainActivity,
                    Uri.fromFile(outputPath),
                    config
                )
            }
    }

    private fun createPDF(
        imgWidth: Double,
        imgHeight: Double,
        localJpeg: File,
        visionText: com.google.mlkit.vision.text.Text
    ): File {
        
        val doc = PDFDoc()
        val outputFile = File(
            this.filesDir, com.pdftron.pdf.utils.Utils.getFileNameNotInUse(
                "scanned_doc_output.pdf"
            )
        )

        // First convert the image to a PDF Doc
        Convert.toPdf(doc, localJpeg.absolutePath)

        val page = doc.getPage(1) // currently this sample only supports 1 page
        val ratio = page.pageWidth / imgWidth;

        // We will need to generate a JSON containing the text data, which will be used
        // to insert the text information into the PDF document
        val jsonWords = JSONArray()
        for (block in visionText.textBlocks) {
            for (line in block.lines) {
                for (element in line.elements) {
                    val elementText = element.text
                    val elementFrame = element.boundingBox

                    val pdfRect =
                        androidRectToPdfRect(elementFrame!!, ratio, imgHeight)
                    pdfRect.normalize()

                    val word = JSONObject()
                    word.put("font-size", (pdfRect.y2 - pdfRect.y1).toInt())
                    word.put("length", (pdfRect.x2 - pdfRect.x1).toInt())
                    word.put("text", elementText)
                    word.put("orientation", "U")
                    word.put("x", pdfRect.x1.toInt())
                    word.put("y", pdfRect.y1.toInt())
                    jsonWords.put(word)
                }
            }
        }

        val jsonObj = JSONObject()
        val jsonPages = JSONArray()

        val jsonPage = JSONObject()
        jsonPage.put("Word", jsonWords)
        jsonPage.put("num", 1) // Only supports one page
        jsonPage.put("dpi", 96)
        jsonPage.put("origin", "BottomLeft")

        jsonPages.put(jsonPage)
        jsonObj.put("Page", jsonPages)

        OCRModule.applyOCRJsonToPDF(doc, jsonObj.toString());
        doc.save(outputFile.absolutePath, SDFDoc.SaveMode.LINEARIZED, null)
        return outputFile
    }

6. Now let's add code that will help us with the OCR portion, which creates searchable and selectable text from static images. There are two steps: process the image using ML Kit, and then create a PDF using the scanned image and processed text.

In your MainActivity, add the following methods:

Now you can capture a physical document, upload it to ML Kit for text recognition, and open the text-searchable and -selectable PDF document in the Apryse viewer.

More Features, Next Steps

Copied to clipboard

By following the steps above, you’ve created a professional Android scanner app for your invoices, bills, letters, and other paper statements. And by using the Apryse document viewer, you can then mark up those scanned documents by adding annotations, signatures, stamps, and much more! You can also drop in loads of other Apryse SDK capabilities if you wish, such as redaction, page manipulation, etc. to edit your scanned documents.

Download our free trial and explore our guides & documentation for our Android PDF library to see the possibilities for yourself.

And if you have any questions, please feel free to get in touch!

You can find the source code for this blog post at Github.

How to Build an Android Document Scanner App with OCR

Client Setup for Android Document Scanner with Apryse SDK

More Features, Next Steps

Resources

Related Articles

View all blogs

How to Solve Six Common Problems when Getting Started with Apryse WebViewer

Using the API to get more from Spreadsheet Editor

Adding Multiple Digital Signatures with Apryse WebViewer SDK and JavaScript