Chapter 1: IOcrService Interface
April 25, 2025 ยท View on GitHub
Introduction
Have you ever needed to extract text from an image in your app? Maybe you want to scan a receipt, recognize text from a business card, or read a document without typing it all out manually. This is where Optical Character Recognition (OCR) comes in!
In this chapter, we'll explore the IOcrService interface, which is the foundation of our OCR library. Think of it as the blueprint that defines what our OCR service can do, regardless of which platform (Android, iOS, Windows) your app is running on.
What Problem Does IOcrService Solve?
Imagine you're building a receipt scanning app. On different platforms (iOS, Android, Windows), the underlying OCR technology works differently:
- iOS might use Apple's Vision framework
- Android might use Google's ML Kit
- Windows might use a different OCR engine
Without a common interface, you'd need to write different code for each platform. That's where IOcrService comes in - it provides a consistent way to perform OCR operations across all platforms.
Understanding IOcrService Through an Analogy
Think of IOcrService as a universal remote control for different TVs:
- Different TV brands (Samsung, LG, Sony) are like different platforms (iOS, Android, Windows)
- But the remote's buttons (power, volume, channel) work the same way for all TVs
- Similarly,
IOcrServicemethods work the same way across all platforms
Key Components of IOcrService
Let's break down the main parts of the IOcrService interface:
1. Core Methods
Task<OcrResult> RecognizeTextAsync(byte[] imageData, bool tryHard = false, CancellationToken ct = default);
Task<OcrResult> RecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default);
Task StartRecognizeTextAsync(byte[] imageData, OcrOptions options, CancellationToken ct = default);
These methods allow you to:
- Pass an image (as byte array)
- Get recognized text back (as an
OcrResult) - Configure how the recognition works (using
OcrOptions)
2. Event for Asynchronous Recognition
event EventHandler<OcrCompletedEventArgs> RecognitionCompleted;
This event is triggered when OCR recognition completes, allowing your app to respond accordingly.
3. Language Support
IReadOnlyCollection<string> SupportedLanguages { get; }
This property tells you which languages the OCR service can recognize.
4. Initialization
Task InitAsync(CancellationToken ct = default);
This method initializes the OCR service on the current platform.
Using IOcrService: A Simple Example
Let's see how you might use IOcrService to extract text from an image:
// 1. Get access to the OCR service
IOcrService ocrService = /* we'll learn how to get this in later chapters */;
// 2. Initialize the service
await ocrService.InitAsync();
// 3. Load an image (for example, from a file)
byte[] imageBytes = File.ReadAllBytes("receipt.jpg");
// 4. Recognize text in the image
OcrResult result = await ocrService.RecognizeTextAsync(imageBytes);
// 5. Use the recognized text
if (result.Success)
{
Console.WriteLine("Recognized text: " + result.AllText);
// Print each line separately
foreach (string line in result.Lines)
{
Console.WriteLine($"Line: {line}");
}
}
This code will:
- Initialize the OCR service
- Load an image from a file
- Extract text from the image
- Print the recognized text if successful
Advanced Usage with Options
For more control over the OCR process, you can use OcrOptions:
// Create options for OCR
var options = new OcrOptions(
language: "en", // English language
tryHard: true, // More accurate but slower
patternConfig: new OcrPatternConfig(
regexPattern: @"\d{3}-\d{2}-\d{4}", // Pattern for SSN
validationFunction: text => text.Length == 11
)
);
// Recognize text with options
OcrResult result = await ocrService.RecognizeTextAsync(imageBytes, options);
// Check if any patterns were matched
if (result.MatchedValues.Count > 0)
{
Console.WriteLine($"Found SSN: {result.MatchedValues[0]}");
}
This example shows how to:
- Create OCR options with a specific language and accuracy setting
- Define a pattern to look for (in this case, a Social Security Number format)
- Extract and validate that specific pattern from the image
How IOcrService Works Behind the Scenes
When you call methods on IOcrService, here's what happens:
sequenceDiagram
participant App as Your App
participant Interface as IOcrService
participant Platform as Platform-Specific Implementation
participant OCR as Native OCR Engine
App->>Interface: RecognizeTextAsync(imageBytes)
Interface->>Platform: Forward request
Platform->>OCR: Process image
OCR->>Platform: Return recognized text
Platform->>Interface: Format as OcrResult
Interface->>App: Return OcrResult
- Your app calls a method on the
IOcrServiceinterface - The request is forwarded to the platform-specific implementation
- The implementation uses the native OCR engine for that platform
- The recognized text is formatted into a common
OcrResultstructure - The result is returned to your app
This abstraction layer shields you from having to deal with platform-specific code.
The Contract: What IOcrService Guarantees
The IOcrService interface is a contract that guarantees:
- Consistency: The same methods are available on all platforms
- Predictability: Methods return the same type of results regardless of platform
- Simplicity: You don't need to write platform-specific code
Conclusion
In this chapter, we've learned about the IOcrService interface, which provides a consistent way to perform OCR operations across different platforms. We've seen how to use its methods to extract text from images and how to configure the OCR process with options.
The IOcrService interface is just the beginning of our OCR journey. In the next chapter, we'll explore the OcrResult class, which represents the output of the OCR process and provides structured access to the recognized text.
Key Takeaways
IOcrServiceis an interface that defines OCR operations across platforms- It allows you to write platform-independent code for text recognition
- Core methods include
RecognizeTextAsyncandStartRecognizeTextAsync - You can configure OCR with options like language and accuracy settings
- The interface handles platform-specific details behind the scenes
Generated by AI Codebase Knowledge Builder