Ocr

June 23, 2026 · View on GitHub

Overview

OCR API

Available Operations

process - OCR

process

OCR

Example Usage

from mistralai.client import Mistral
import os


with Mistral(
    api_key=os.getenv("MISTRAL_API_KEY", ""),
) as mistral:

    res = mistral.ocr.process(model="CX-9", document={
        "type": "document_url",
        "document_url": "https://upset-labourer.net/",
    }, bbox_annotation_format={
        "type": "text",
    }, document_annotation_format={
        "type": "text",
    }, include_blocks=False)

    # Handle response
    print(res)

Parameters

Parameter	Type	Required	Description	Example
`model`	Nullable[str]	:heavy_check_mark:	N/A
`document`	models.DocumentUnion	:heavy_check_mark:	Document to run OCR on
`pages`	OptionalNullable[models.Pages]	:heavy_minus_sign:	Specific pages to process. Accepts a list of integers or a string of comma-separated numbers and ranges (e.g. '0,1,2' or '0-5' or '0,2-4'). Page numbers start from 0.
`include_image_base64`	OptionalNullable[bool]	:heavy_minus_sign:	Include image URLs in response
`image_limit`	OptionalNullable[int]	:heavy_minus_sign:	Max images to extract
`image_min_size`	OptionalNullable[int]	:heavy_minus_sign:	Minimum height and width of image to extract
`bbox_annotation_format`	OptionalNullable[models.ResponseFormat]	:heavy_minus_sign:	Structured output class for extracting useful information from each extracted bounding box / image from document. Only json_schema is valid for this field	Example 1: { "type": "text" } Example 2: { "type": "json_object" } Example 3: { "type": "json_schema", "json_schema": { "schema": { "properties": { "name": { "title": "Name", "type": "string" }, "authors": { "items": { "type": "string" }, "title": "Authors", "type": "array" } }, "required": [ "name", "authors" ], "title": "Book", "type": "object", "additionalProperties": false }, "name": "book", "strict": true } }
`document_annotation_format`	OptionalNullable[models.ResponseFormat]	:heavy_minus_sign:	Structured output class for extracting useful information from the entire document. Only json_schema is valid for this field	Example 1: { "type": "text" } Example 2: { "type": "json_object" } Example 3: { "type": "json_schema", "json_schema": { "schema": { "properties": { "name": { "title": "Name", "type": "string" }, "authors": { "items": { "type": "string" }, "title": "Authors", "type": "array" } }, "required": [ "name", "authors" ], "title": "Book", "type": "object", "additionalProperties": false }, "name": "book", "strict": true } }
`document_annotation_prompt`	OptionalNullable[str]	:heavy_minus_sign:	Optional prompt to guide the model in extracting structured output from the entire document. A document_annotation_format must be provided.
`table_format`	OptionalNullable[models.TableFormat]	:heavy_minus_sign:	N/A
`extract_header`	Optional[bool]	:heavy_minus_sign:	N/A
`extract_footer`	Optional[bool]	:heavy_minus_sign:	N/A
`include_blocks`	Optional[bool]	:heavy_minus_sign:	Return paragraph-level bounding boxes for all content blocks in the response
`confidence_scores_granularity`	OptionalNullable[models.ConfidenceScoresGranularity]	:heavy_minus_sign:	Granularity for confidence scores: 'word' (per-word scores) or 'page' (aggregate only). Defaults to None (no confidence scores) to keep response payload small.
`retries`	Optional[utils.RetryConfig]	:heavy_minus_sign:	Configuration to override the default retry behavior of the client.

Response

models.OCRResponse

Errors

Error Type	Status Code	Content Type
errors.HTTPValidationError	422	application/json
errors.SDKError	4XX, 5XX	/