Instructor Tutorial: Structured LLM Outputs
June 8, 2026 ยท View on GitHub
Get reliable, typed responses from LLMs with Pydantic validation.
Why This Track Matters
Instructor is increasingly relevant for developers working with modern AI/ML infrastructure. Get reliable, typed responses from LLMs with Pydantic validation, and this track helps you understand the architecture, key patterns, and production considerations.
This track focuses on:
- Extract Structured Data reliably from any LLM
- Define Schemas with Pydantic for type safety
- Handle Validation Errors with automatic retries
- Work with Complex Data including nested objects
๐ฏ What is Instructor?
InstructorView Repo is a library that makes it easy to get structured, validated outputs from LLMs. Instead of parsing free-form text, define a Pydantic model and Instructor ensures the LLM returns data that matches your schema.
Why Instructor?
| Feature | Description |
|---|---|
| Type Safety | Pydantic models ensure correct data types |
| Validation | Built-in validation with retry logic |
| Multi-Provider | OpenAI, Anthropic, Google, Ollama, and more |
| Streaming | Stream partial objects as they're generated |
| Simple API | Just patch your existing client |
| Extensible | Custom validators and complex nested structures |
Mental Model
flowchart LR
A[Prompt + Schema] --> B[Instructor]
B --> C[LLM]
C --> D[Raw Response]
D --> E[Pydantic Validation]
E --> F{Valid?}
F -->|Yes| G[Typed Object]
F -->|No| H[Retry with Feedback]
H --> C
classDef input fill:#e1f5fe,stroke:#01579b
classDef process fill:#f3e5f5,stroke:#4a148c
classDef output fill:#e8f5e8,stroke:#1b5e20
class A input
class B,C,D,E,F,H process
class G output
Current Snapshot (auto-updated)
- repository:
instructor-ai/instructor - stars: about 13.1k
- GitHub release reference:
v1.15.1(checked 2026-06-08; release metadata on GitHub)
Chapter Guide
- Chapter 1: Getting Started - Installation, setup, and first structured extraction
- Chapter 2: Pydantic Models - Designing effective schemas
- Chapter 3: Validation & Retries - Ensuring data quality
- Chapter 4: Complex Structures - Nested objects and lists
- Chapter 5: Streaming - Partial object streaming
- Chapter 6: Multiple Providers - OpenAI, Anthropic, Ollama
- Chapter 7: Advanced Patterns - Validators, hooks, and optimization
- Chapter 8: Production Use - Best practices and scaling
What You Will Learn
- Extract Structured Data reliably from any LLM
- Define Schemas with Pydantic for type safety
- Handle Validation Errors with automatic retries
- Work with Complex Data including nested objects
- Stream Partial Results for better UX
- Use Multiple Providers with the same code
- Build Production Systems with proper error handling
Prerequisites
- Python 3.9+
- Basic Pydantic knowledge
- API key for your LLM provider
Quick Start
# Install Instructor
pip install instructor
# With specific providers
pip install instructor[anthropic]
pip install instructor[google]
Your First Extraction
import instructor
from openai import OpenAI
from pydantic import BaseModel
# Patch the OpenAI client
client = instructor.from_openai(OpenAI())
# Define your output schema
class User(BaseModel):
name: str
age: int
email: str
# Extract structured data
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[
{"role": "user", "content": "John Doe is 30 years old. His email is john@example.com"}
]
)
print(user)
# User(name='John Doe', age=30, email='john@example.com')
print(user.name) # Fully typed!
# 'John Doe'
Complex Extraction
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
class Task(BaseModel):
title: str = Field(description="Brief task title")
description: str = Field(description="Detailed description")
priority: Priority
due_date: Optional[str] = Field(description="ISO format date if mentioned")
assignee: Optional[str] = None
class TaskList(BaseModel):
tasks: List[Task]
project_name: str
# Extract multiple tasks from natural language
result = client.chat.completions.create(
model="gpt-4o",
response_model=TaskList,
messages=[
{"role": "user", "content": """
For the Website Redesign project:
- High priority: Update homepage hero section by Friday, assign to Sarah
- Medium priority: Fix mobile navigation issues
- Low priority: Add dark mode support, due next month
"""}
]
)
for task in result.tasks:
print(f"[{task.priority.value}] {task.title}")
# [high] Update homepage hero section
# [medium] Fix mobile navigation issues
# [low] Add dark mode support
Validation with Retries
from pydantic import BaseModel, field_validator
import instructor
class ValidatedUser(BaseModel):
name: str
email: str
age: int
@field_validator('email')
@classmethod
def validate_email(cls, v):
if '@' not in v:
raise ValueError('Invalid email format')
return v
@field_validator('age')
@classmethod
def validate_age(cls, v):
if v < 0 or v > 150:
raise ValueError('Age must be between 0 and 150')
return v
# Instructor automatically retries with validation feedback
user = client.chat.completions.create(
model="gpt-4o",
response_model=ValidatedUser,
max_retries=3, # Retry up to 3 times on validation failure
messages=[
{"role": "user", "content": "Extract: Bob, bob.email, 25"}
]
)
# Instructor will retry and get a proper email format
Streaming Partial Objects
from instructor import Partial
class Article(BaseModel):
title: str
summary: str
key_points: List[str]
# Stream partial results
for partial in client.chat.completions.create_partial(
model="gpt-4o",
response_model=Article,
messages=[
{"role": "user", "content": "Write an article about AI agents"}
]
):
print(f"Title: {partial.title}")
print(f"Points so far: {len(partial.key_points or [])}")
# Partial objects are available as they stream in
Multiple Providers
import instructor
from openai import OpenAI
from anthropic import Anthropic
# OpenAI
openai_client = instructor.from_openai(OpenAI())
# Anthropic
anthropic_client = instructor.from_anthropic(Anthropic())
# Same extraction code works with both!
def extract_user(client, text: str) -> User:
return client.chat.completions.create(
model="gpt-4o" if isinstance(client, OpenAI) else "claude-3-5-sonnet-20241022",
response_model=User,
messages=[{"role": "user", "content": text}]
)
Common Use Cases
| Use Case | Example Schema |
|---|---|
| Data Extraction | Extract entities from documents |
| Classification | Categorize text into predefined classes |
| Summarization | Structured summaries with key points |
| Form Filling | Extract form fields from unstructured text |
| Code Generation | Generate code with specific structure |
| API Responses | Ensure LLM outputs match API schemas |
Learning Path
๐ข Beginner Track
- Chapters 1-3: Setup, basic models, and validation
- Extract simple structured data
๐ก Intermediate Track
- Chapters 4-6: Complex structures, streaming, and providers
- Build production extraction pipelines
๐ด Advanced Track
- Chapters 7-8: Advanced patterns and production
- Master structured LLM applications
Ready to get structured outputs from LLMs? Let's begin with Chapter 1: Getting Started!
Generated for Awesome Code Docs
Related Tutorials
Navigation & Backlinks
- Start Here: Chapter 1: Getting Started with Instructor
- Back to Main Catalog
- Browse A-Z Tutorial Directory
- Search by Intent
- Explore Category Hubs
Full Chapter Map
- Chapter 1: Getting Started with Instructor
- Chapter 2: Crafting Effective Pydantic Models
- Chapter 3: Validation, Errors, and Retries
- Chapter 4: Complex Structures and Nested Data
- Chapter 5: Streaming Structured Outputs
- Chapter 6: Using Multiple Providers
- Chapter 7: Advanced Patterns and Guardrails
- Chapter 8: Production Use and Operations
Source References
Generated by AI Codebase Knowledge Builder