Chapter 5: OcrImplementation
April 25, 2025 ยท View on GitHub
In the previous chapter, we explored the OcrPatternMatcher class, which helps you extract specific information from OCR results. Now, let's dive into the heart of our OCR library: the platform-specific implementations that make everything work across different devices.
Introduction to OcrImplementation
Imagine you've built an amazing app that can scan receipts to track expenses. You want your app to work perfectly whether your users have an iPhone, an Android phone, or a Windows device. But here's the challenge: each platform has its own way of doing OCR:
- iOS uses Apple's Vision framework
- Android uses Google's ML Kit
- Windows uses Windows.Media.OCR
This is where OcrImplementation comes in. It's like having a team of specialized translators who all speak different languages but can all translate into the same target language. Each implementation understands its native platform's OCR capabilities but presents them through the common IOcrService interface we learned about in Chapter 1.
Why Do We Need OcrImplementation?
Let's consider a practical example: you're building a business card scanner app that needs to work on both iOS and Android. Without OcrImplementation, you'd need to:
- Learn how to use Apple's Vision framework for iOS
- Learn how to use Google's ML Kit for Android
- Write completely different code for each platform
- Maintain two separate codebases
With OcrImplementation, you:
- Write your app code once using the common
IOcrServiceinterface - Let the platform-specific implementations handle the details
- Your app works the same way across all platforms
Understanding OcrImplementation Through an Analogy
Think of OcrImplementation as a power adapter for international travel:
- Different countries have different electrical outlets (like different platforms have different OCR APIs)
- A power adapter lets you plug your device into any outlet (like
OcrImplementationlets your app use any platform's OCR) - You don't need to know the electrical standards of each country (like you don't need to know the details of each platform's OCR API)
- Your device works the same way regardless of which country you're in (like your app works the same way regardless of which platform it's running on)
How OcrImplementation Works
The OcrImplementation class is actually not a single class, but a set of platform-specific classes that all implement the same IOcrService interface. Each implementation is in a separate file with a platform-specific extension:
OcrImplementation.android.csfor AndroidOcrImplementation.macios.csfor iOS/macOSOcrImplementation.windows.csfor Windows
When your app runs, the correct implementation is automatically selected based on the platform.
Key Components of OcrImplementation
Let's look at the common elements across all implementations:
1. Platform-Specific OCR Engine
Each implementation uses the native OCR engine for its platform:
- Android: Google's ML Kit Text Recognition
- iOS: Apple's Vision framework with VNRecognizeTextRequest
- Windows: Windows.Media.OCR
2. Common Interface Methods
Despite using different OCR engines, all implementations provide the same methods defined in the IOcrService interface:
// Initialize the OCR service
Task InitAsync(CancellationToken ct = default);
// Recognize text in an image
Task<OcrResult> RecognizeTextAsync(byte[] imageData, bool tryHard = false, CancellationToken ct = default);
Task<OcrResult> RecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default);
// Start asynchronous recognition
Task StartRecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default);
3. Result Processing
Each implementation converts the platform-specific OCR results into the common OcrResult format we learned about in Chapter 2.
Using OcrImplementation (Indirectly)
As a user of the OCR library, you typically won't interact with OcrImplementation directly. Instead, you'll use it through the OcrPlugin or the UseOcr Extension Method, which we'll learn about in the next chapters.
However, it's helpful to understand what happens behind the scenes when you use the OCR service:
// This is what you write in your app
IOcrService ocrService = /* get the service */;
OcrResult result = await ocrService.RecognizeTextAsync(imageBytes);
// Behind the scenes, this calls the platform-specific implementation
// For example, on Android:
// result = await OcrImplementation.android.RecognizeTextAsync(imageBytes);
A Peek Inside: Android Implementation
Let's take a look at a simplified version of the Android implementation to understand how it works:
// From OcrImplementation.android.cs
internal class OcrImplementation : IOcrService
{
// Platform-specific OCR engine
private static ITextRecognizer? s_textRecognizer;
public async Task<OcrResult> RecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default)
{
// Convert byte array to Android bitmap
using var srcBitmap = await BitmapFactory.DecodeByteArrayAsync(imageData, 0, imageData.Length);
// Create ML Kit input image
using var srcImage = InputImage.FromBitmap(srcBitmap, 0);
// Get or create text recognizer based on options
ITextRecognizer textScanner;
if (options.TryHard)
{
// Use more accurate but slower recognizer
textScanner = TextRecognition.GetClient(new TextRecognizerOptions.Builder()
.SetExecutor(/* executor */)
.Build());
}
else
{
// Use faster on-device recognizer
textScanner = TextRecognition.GetClient(TextRecognizerOptions.DefaultOptions);
}
// Process the image
var result = await textScanner.Process(srcImage).AsAsync<Text>();
// Convert ML Kit result to OcrResult
return ProcessOcrResult(result, options);
}
// Other methods...
}
This code:
- Converts the input image bytes to an Android bitmap
- Creates an ML Kit input image from the bitmap
- Chooses the appropriate text recognizer based on the options
- Processes the image with ML Kit
- Converts the ML Kit result to our common
OcrResultformat
A Peek Inside: iOS Implementation
Now let's look at a simplified version of the iOS implementation:
// From OcrImplementation.macios.cs
class OcrImplementation : IOcrService
{
public async Task<OcrResult> RecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default)
{
// Convert byte array to UIImage
using var srcImage = ImageFromByteArray(imageData);
var imageSize = srcImage.Size;
// Create Vision request
using var recognizeTextRequest = new VNRecognizeTextRequest((request, error) =>
{
// Handle completion...
});
// Configure request based on options
recognizeTextRequest.RecognitionLevel = options.TryHard
? VNRequestTextRecognitionLevel.Accurate
: VNRequestTextRecognitionLevel.Fast;
// Set language if specified
if (!string.IsNullOrEmpty(options.Language))
{
recognizeTextRequest.RecognitionLanguages = new[] { options.Language };
}
// Process the image
using var ocrHandler = new VNImageRequestHandler(srcImage.CGImage, new NSDictionary());
ocrHandler.Perform(new VNRequest[] { recognizeTextRequest }, out var error);
// Result is processed in the request completion handler
// and returned via TaskCompletionSource
}
// Other methods...
}
This code:
- Converts the input image bytes to a UIImage
- Creates a Vision text recognition request
- Configures the request based on the options
- Processes the image with Vision
- Converts the Vision result to our common
OcrResultformat
A Peek Inside: Windows Implementation
Finally, let's look at a simplified version of the Windows implementation:
// From OcrImplementation.windows.cs
class OcrImplementation : IOcrService
{
public async Task<OcrResult> RecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default)
{
// Create OCR engine
var ocrEngine = OcrEngine.TryCreateFromUserProfileLanguages();
// Convert byte array to stream
using var stream = new InMemoryRandomAccessStream();
await stream.WriteAsync(imageData.AsBuffer());
stream.Seek(0);
// Decode image
var decoder = await BitmapDecoder.CreateAsync(stream);
var softwareBitmap = await decoder.GetSoftwareBitmapAsync();
// Process the image
var ocrResult = await ocrEngine.RecognizeAsync(softwareBitmap);
// Convert Windows OCR result to OcrResult
return ProcessOcrResult(ocrResult, options);
}
// Other methods...
}
This code:
- Creates a Windows OCR engine
- Converts the input image bytes to a stream and then to a bitmap
- Processes the image with Windows OCR
- Converts the Windows OCR result to our common
OcrResultformat
How OcrImplementation Works Behind the Scenes
When you call a method on the OCR service, here's what happens:
sequenceDiagram
participant App as Your App
participant Plugin as OcrPlugin
participant Impl as OcrImplementation
participant Native as Native OCR Engine
participant Result as OcrResult
App->>Plugin: RecognizeTextAsync(imageBytes)
Plugin->>Impl: RecognizeTextAsync(imageBytes)
Impl->>Native: Process image using platform API
Native-->>Impl: Return platform-specific result
Impl->>Result: Convert to common OcrResult format
Result-->>Impl: Return OcrResult
Impl-->>Plugin: Return OcrResult
Plugin-->>App: Return OcrResult
- Your app calls a method on the OCR plugin
- The plugin forwards the call to the platform-specific implementation
- The implementation uses the native OCR engine to process the image
- The native engine returns a platform-specific result
- The implementation converts the result to the common
OcrResultformat - The result is returned to your app
Processing OCR Results
One of the most important parts of each implementation is the ProcessOcrResult method, which converts the platform-specific OCR results into our common OcrResult format:
// Simplified example from Android implementation
private static OcrResult ProcessOcrResult(Text textResult, OcrOptions options)
{
var ocrResult = new OcrResult();
// Set the full text
ocrResult.AllText = textResult.GetText();
// Extract lines and elements
foreach (var block in textResult.TextBlocks)
{
foreach (var line in block.Lines)
{
ocrResult.Lines.Add(line.Text);
foreach (var element in line.Elements)
{
var ocrElement = new OcrElement
{
Text = element.Text,
Confidence = element.Confidence,
X = element.BoundingBox.Left,
Y = element.BoundingBox.Top,
Width = element.BoundingBox.Width(),
Height = element.BoundingBox.Height()
};
ocrResult.Elements.Add(ocrElement);
}
}
}
// Apply pattern matching if requested
foreach (var config in options.PatternConfigs)
{
var match = OcrPatternMatcher.ExtractPattern(ocrResult.AllText, config);
if (!string.IsNullOrEmpty(match))
{
ocrResult.MatchedValues.Add(match);
}
}
ocrResult.Success = true;
return ocrResult;
}
This code:
- Creates a new
OcrResult - Sets the full text from the OCR result
- Extracts lines and elements (words) with their positions
- Applies pattern matching if requested
- Sets the success flag and returns the result
Platform-Specific Features and Limitations
Each platform has its own unique features and limitations:
Android (ML Kit)
- Supports both on-device (faster) and cloud-based (more accurate) recognition
- Requires downloading language models for on-device recognition
- Good support for Latin-based scripts on-device, wider language support in the cloud
iOS (Vision)
- Excellent accuracy for Latin-based scripts
- Fast on-device recognition
- Language support varies by iOS version
Windows (Windows.Media.OCR)
- Good integration with Windows apps
- Language support depends on installed language packs
- Limited configuration options compared to other platforms
Real-World Example: Receipt Scanner
Let's put everything together in a real-world example of a receipt scanner that works across platforms:
// Get the OCR service (we'll learn how in the next chapter)
IOcrService ocrService = /* get the service */;
// Initialize the service
await ocrService.InitAsync();
// Create options for receipt scanning
var options = new OcrOptions(
language: "en",
tryHard: true,
patternConfig: new OcrPatternConfig(
regexPattern: @"\$\d+\.\d{2}", // Pattern for prices
validationFunction: text => decimal.TryParse(text.TrimStart('$'), out _)
)
);
// Load a receipt image (e.g., from camera or file)
byte[] imageBytes = /* get image bytes */;
// Recognize text in the receipt
OcrResult result = await ocrService.RecognizeTextAsync(imageBytes, options);
// Check if OCR was successful
if (result.Success)
{
Console.WriteLine("Receipt text: " + result.AllText);
// Display matched prices
if (result.MatchedValues.Count > 0)
{
Console.WriteLine("Found prices:");
foreach (string price in result.MatchedValues)
{
Console.WriteLine($" {price}");
}
}
}
This code:
- Gets the OCR service (we'll learn how in the next chapter)
- Initializes the service
- Creates options for receipt scanning with a pattern for prices
- Loads a receipt image
- Recognizes text in the receipt
- Displays the full text and any matched prices
The beauty of this code is that it works exactly the same way on Android, iOS, and Windows, even though the underlying OCR engines are completely different.
Conclusion
In this chapter, we've explored the OcrImplementation classes, which are the platform-specific implementations of the OCR service. We've seen how they use the native OCR capabilities of each platform while presenting a unified interface to your app.
The OcrImplementation classes are the bridge between your app and the platform's native OCR capabilities. They handle all the platform-specific details so you can write your app code once and have it work across all platforms.
In the next chapter, we'll explore the OcrPlugin class, which provides a convenient way to access the OCR service in your app.
Key Takeaways
OcrImplementationprovides platform-specific implementations of theIOcrServiceinterface- Each implementation uses the native OCR capabilities of its platform
- All implementations present the same interface to your app
- The correct implementation is automatically selected based on the platform
- You typically don't interact with
OcrImplementationdirectly, but through theOcrPluginorUseOcrextension method - The implementations handle all the platform-specific details so you can write your app code once
Generated by AI Codebase Knowledge Builder