Extended Thinking Configuration

March 18, 2026 · View on GitHub

Enable extended thinking/reasoning modes for AI models that support deeper reasoning capabilities. This feature allows models to "think through" complex problems before providing a response.

Overview

NeuroLink supports extended thinking/reasoning configuration for models that provide this capability. Extended thinking enables models to perform more thorough reasoning, particularly useful for complex tasks like mathematical proofs, coding problems, and multi-step analysis.

Supported Models

Gemini 3 Models (Google Vertex AI / AI Studio)

gemini-3.1-pro - Full thinking support with high token budgets (up to 100,000)
gemini-3-flash-preview - Fast thinking with support for "minimal" level (up to 50,000)

Gemini 2.5 Models (Google Vertex AI / AI Studio)

gemini-2.5-pro - Supports thinking configuration (up to 32,000 tokens)
gemini-2.5-flash - Supports thinking configuration (up to 32,000 tokens)

Claude Models (Anthropic)

All Claude 4.0+ models support extended thinking via budget tokens:

claude-sonnet-4-20250514 (Claude Sonnet 4)
claude-opus-4-20250514 (Claude Opus 4)
claude-opus-4-1-20250805 (Claude Opus 4.1)
claude-sonnet-4-5-20250929 (Claude Sonnet 4.5)
claude-opus-4-5-20251101 (Claude Opus 4.5)
claude-haiku-4-5-20251001 (Claude Haiku 4.5)
claude-sonnet-4-6 (Claude Sonnet 4.6)
claude-opus-4-6 (Claude Opus 4.6)

Quick Start

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
  input: { text: "Prove that the square root of 2 is irrational" },
  provider: "google-ai",
  model: "gemini-2.5-flash",
  thinkingConfig: { thinkingLevel: "high" },
});

console.log(result.content);

Gemini 3 Thinking Configuration

For Gemini 3 models, use thinkingLevel to control reasoning depth:

const response = await neurolink.generate({
  input: { text: "Prove that the square root of 2 is irrational" },
  provider: "vertex",
  model: "gemini-3-flash-preview",
  thinkingConfig: {
    thinkingLevel: "high", // 'minimal' | 'low' | 'medium' | 'high'
  },
});

Thinking Levels

Level	Description	Best For
`minimal`	Near-zero thinking (Flash models only)	Simple queries requiring speed
`low`	Fast reasoning for simple tasks	Quick analysis, summaries
`medium`	Balanced reasoning/latency trade-off	General-purpose tasks
`high`	Maximum reasoning depth	Complex reasoning, math, coding

Maximum Token Budgets by Model

Model	Max Thinking Budget
`gemini-3-pro-*`	100,000 tokens
`gemini-3-flash-*`	50,000 tokens
`gemini-2.5-*`	32,000 tokens
`claude-opus-4-6`	100,000 tokens
`claude-sonnet-4-6`	100,000 tokens
`claude-opus-4-5-*`	100,000 tokens
`claude-sonnet-4-5-*`	100,000 tokens
`claude-haiku-4-5-*`	100,000 tokens
`claude-opus-4-1-*`	100,000 tokens
`claude-opus-4-*`	100,000 tokens
`claude-sonnet-4-*`	100,000 tokens

Anthropic Claude Thinking Configuration

For Claude models, use budgetTokens to set the thinking token budget:

const response = await neurolink.generate({
  input: { text: "Solve this complex math problem step by step..." },
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  thinkingConfig: {
    enabled: true,
    budgetTokens: 10000, // Range: 5000-100000
  },
});

Budget Token Guidelines

Minimum: 5,000 tokens
Maximum: 100,000 tokens
Recommended for simple tasks: 5,000-10,000 tokens
Recommended for complex reasoning: 20,000-50,000 tokens
Maximum depth: 50,000-100,000 tokens

Configuration Options

The thinkingConfig object supports the following options:

thinkingConfig: {
  enabled?: boolean;           // Enable/disable thinking
  type?: "enabled" | "disabled"; // Alternative enable/disable
  budgetTokens?: number;       // Token budget (Anthropic models)
  thinkingLevel?: "minimal" | "low" | "medium" | "high"; // Thinking level (Gemini models)
}

CLI Usage

Extended thinking is also available via the CLI:

# Enable thinking with default settings
neurolink generate "Solve this problem" --thinking

# Set thinking budget for Anthropic
neurolink generate "Complex problem" --provider anthropic --thinking --thinkingBudget 20000

# Set thinking level for Gemini 3
neurolink generate "Complex problem" --provider vertex --model gemini-3-pro-preview --thinkingLevel high

CLI Options

Option	Description	Default
`--thinking`	Enable extended thinking	false
`--thinkingBudget`	Token budget (Anthropic: 5000-100000)	10000
`--thinkingLevel`	Thinking level (Gemini 3: minimal, low, medium, high)	medium

Best Practices

When to Use High Thinking

Complex mathematical proofs and calculations
Multi-step coding problems and debugging
Detailed analysis requiring multiple considerations
Tasks where accuracy is more important than speed

When to Use Low/Minimal Thinking

Simple queries where speed matters
Straightforward information retrieval
Quick summaries and formatting tasks
High-volume, latency-sensitive applications

General Guidelines

Start with medium: Use medium as your default and adjust based on results
Match model to task: Use Pro models for complex tasks, Flash for speed
Monitor token usage: Higher thinking levels consume more tokens
Test performance: Compare response quality vs. latency for your use case

Example: Complex Reasoning Task

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Complex coding problem with high reasoning
const result = await neurolink.generate({
  input: {
    text: `
      Design an optimal algorithm to find the longest palindromic subsequence
      in a string. Explain your approach, prove its correctness, and analyze
      the time and space complexity.
    `,
  },
  provider: "vertex",
  model: "gemini-3-pro-preview",
  thinkingConfig: {
    thinkingLevel: "high",
  },
  maxTokens: 4000,
});

console.log(result.content);

Model Detection Utilities

NeuroLink provides utilities to check thinking support:

import {
  supportsThinkingConfig,
  getMaxThinkingBudgetTokens,
} from "@juspay/neurolink";

// Check if a model supports thinking
const supports = supportsThinkingConfig("gemini-3-pro-preview"); // true

// Get maximum budget for a model
const maxBudget = getMaxThinkingBudgetTokens("gemini-3-flash-preview"); // 50000

Important Notes

Provider compatibility: Thinking configuration is provider-specific. Gemini uses thinkingLevel, Claude uses budgetTokens
Token consumption: Extended thinking uses additional tokens beyond the response
Latency impact: Higher thinking levels increase response time
Not all models support thinking: Check supportsThinkingConfig() before enabling
Streaming support: Thinking configuration works with both generate() and stream()