Ocr

June 23, 2026 ยท View on GitHub

Overview

OCR API

Available Operations

process

OCR

Example Usage

from mistralai.client import Mistral
import os


with Mistral(
    api_key=os.getenv("MISTRAL_API_KEY", ""),
) as mistral:

    res = mistral.ocr.process(model="CX-9", document={
        "type": "document_url",
        "document_url": "https://upset-labourer.net/",
    }, bbox_annotation_format={
        "type": "text",
    }, document_annotation_format={
        "type": "text",
    }, include_blocks=False)

    # Handle response
    print(res)

Parameters

ParameterTypeRequiredDescriptionExample
modelNullable[str]:heavy_check_mark:N/A
documentmodels.DocumentUnion:heavy_check_mark:Document to run OCR on
pagesOptionalNullable[models.Pages]:heavy_minus_sign:Specific pages to process. Accepts a list of integers or a string of comma-separated numbers and ranges (e.g. '0,1,2' or '0-5' or '0,2-4'). Page numbers start from 0.
include_image_base64OptionalNullable[bool]:heavy_minus_sign:Include image URLs in response
image_limitOptionalNullable[int]:heavy_minus_sign:Max images to extract
image_min_sizeOptionalNullable[int]:heavy_minus_sign:Minimum height and width of image to extract
bbox_annotation_formatOptionalNullable[models.ResponseFormat]:heavy_minus_sign:Structured output class for extracting useful information from each extracted bounding box / image from document. Only json_schema is valid for this fieldExample 1: {
"type": "text"
}
Example 2: {
"type": "json_object"
}
Example 3: {
"type": "json_schema",
"json_schema": {
"schema": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"title": "Authors",
"type": "array"
}
},
"required": [
"name",
"authors"
],
"title": "Book",
"type": "object",
"additionalProperties": false
},
"name": "book",
"strict": true
}
}
document_annotation_formatOptionalNullable[models.ResponseFormat]:heavy_minus_sign:Structured output class for extracting useful information from the entire document. Only json_schema is valid for this fieldExample 1: {
"type": "text"
}
Example 2: {
"type": "json_object"
}
Example 3: {
"type": "json_schema",
"json_schema": {
"schema": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"title": "Authors",
"type": "array"
}
},
"required": [
"name",
"authors"
],
"title": "Book",
"type": "object",
"additionalProperties": false
},
"name": "book",
"strict": true
}
}
document_annotation_promptOptionalNullable[str]:heavy_minus_sign:Optional prompt to guide the model in extracting structured output from the entire document. A document_annotation_format must be provided.
table_formatOptionalNullable[models.TableFormat]:heavy_minus_sign:N/A
extract_headerOptional[bool]:heavy_minus_sign:N/A
extract_footerOptional[bool]:heavy_minus_sign:N/A
include_blocksOptional[bool]:heavy_minus_sign:Return paragraph-level bounding boxes for all content blocks in the response
confidence_scores_granularityOptionalNullable[models.ConfidenceScoresGranularity]:heavy_minus_sign:Granularity for confidence scores: 'word' (per-word scores) or 'page' (aggregate only). Defaults to None (no confidence scores) to keep response payload small.
retriesOptional[utils.RetryConfig]:heavy_minus_sign:Configuration to override the default retry behavior of the client.

Response

models.OCRResponse

Errors

Error TypeStatus CodeContent Type
errors.HTTPValidationError422application/json
errors.SDKError4XX, 5XX*/*