Extended Thinking Configuration

March 18, 2026 ยท View on GitHub

Enable extended thinking/reasoning modes for AI models that support deeper reasoning capabilities. This feature allows models to "think through" complex problems before providing a response.

Overview

NeuroLink supports extended thinking/reasoning configuration for models that provide this capability. Extended thinking enables models to perform more thorough reasoning, particularly useful for complex tasks like mathematical proofs, coding problems, and multi-step analysis.

Supported Models

Gemini 3 Models (Google Vertex AI / AI Studio)

  • gemini-3.1-pro - Full thinking support with high token budgets (up to 100,000)
  • gemini-3-flash-preview - Fast thinking with support for "minimal" level (up to 50,000)

Gemini 2.5 Models (Google Vertex AI / AI Studio)

  • gemini-2.5-pro - Supports thinking configuration (up to 32,000 tokens)
  • gemini-2.5-flash - Supports thinking configuration (up to 32,000 tokens)

Claude Models (Anthropic)

All Claude 4.0+ models support extended thinking via budget tokens:

  • claude-sonnet-4-20250514 (Claude Sonnet 4)
  • claude-opus-4-20250514 (Claude Opus 4)
  • claude-opus-4-1-20250805 (Claude Opus 4.1)
  • claude-sonnet-4-5-20250929 (Claude Sonnet 4.5)
  • claude-opus-4-5-20251101 (Claude Opus 4.5)
  • claude-haiku-4-5-20251001 (Claude Haiku 4.5)
  • claude-sonnet-4-6 (Claude Sonnet 4.6)
  • claude-opus-4-6 (Claude Opus 4.6)

Quick Start

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
  input: { text: "Prove that the square root of 2 is irrational" },
  provider: "google-ai",
  model: "gemini-2.5-flash",
  thinkingConfig: { thinkingLevel: "high" },
});

console.log(result.content);

Gemini 3 Thinking Configuration

For Gemini 3 models, use thinkingLevel to control reasoning depth:

const response = await neurolink.generate({
  input: { text: "Prove that the square root of 2 is irrational" },
  provider: "vertex",
  model: "gemini-3-flash-preview",
  thinkingConfig: {
    thinkingLevel: "high", // 'minimal' | 'low' | 'medium' | 'high'
  },
});

Thinking Levels

LevelDescriptionBest For
minimalNear-zero thinking (Flash models only)Simple queries requiring speed
lowFast reasoning for simple tasksQuick analysis, summaries
mediumBalanced reasoning/latency trade-offGeneral-purpose tasks
highMaximum reasoning depthComplex reasoning, math, coding

Maximum Token Budgets by Model

ModelMax Thinking Budget
gemini-3-pro-*100,000 tokens
gemini-3-flash-*50,000 tokens
gemini-2.5-*32,000 tokens
claude-opus-4-6100,000 tokens
claude-sonnet-4-6100,000 tokens
claude-opus-4-5-*100,000 tokens
claude-sonnet-4-5-*100,000 tokens
claude-haiku-4-5-*100,000 tokens
claude-opus-4-1-*100,000 tokens
claude-opus-4-*100,000 tokens
claude-sonnet-4-*100,000 tokens

Anthropic Claude Thinking Configuration

For Claude models, use budgetTokens to set the thinking token budget:

const response = await neurolink.generate({
  input: { text: "Solve this complex math problem step by step..." },
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  thinkingConfig: {
    enabled: true,
    budgetTokens: 10000, // Range: 5000-100000
  },
});

Budget Token Guidelines

  • Minimum: 5,000 tokens
  • Maximum: 100,000 tokens
  • Recommended for simple tasks: 5,000-10,000 tokens
  • Recommended for complex reasoning: 20,000-50,000 tokens
  • Maximum depth: 50,000-100,000 tokens

Configuration Options

The thinkingConfig object supports the following options:

thinkingConfig: {
  enabled?: boolean;           // Enable/disable thinking
  type?: "enabled" | "disabled"; // Alternative enable/disable
  budgetTokens?: number;       // Token budget (Anthropic models)
  thinkingLevel?: "minimal" | "low" | "medium" | "high"; // Thinking level (Gemini models)
}

CLI Usage

Extended thinking is also available via the CLI:

# Enable thinking with default settings
neurolink generate "Solve this problem" --thinking

# Set thinking budget for Anthropic
neurolink generate "Complex problem" --provider anthropic --thinking --thinkingBudget 20000

# Set thinking level for Gemini 3
neurolink generate "Complex problem" --provider vertex --model gemini-3-pro-preview --thinkingLevel high

CLI Options

OptionDescriptionDefault
--thinkingEnable extended thinkingfalse
--thinkingBudgetToken budget (Anthropic: 5000-100000)10000
--thinkingLevelThinking level (Gemini 3: minimal, low, medium, high)medium

Best Practices

When to Use High Thinking

  • Complex mathematical proofs and calculations
  • Multi-step coding problems and debugging
  • Detailed analysis requiring multiple considerations
  • Tasks where accuracy is more important than speed

When to Use Low/Minimal Thinking

  • Simple queries where speed matters
  • Straightforward information retrieval
  • Quick summaries and formatting tasks
  • High-volume, latency-sensitive applications

General Guidelines

  1. Start with medium: Use medium as your default and adjust based on results
  2. Match model to task: Use Pro models for complex tasks, Flash for speed
  3. Monitor token usage: Higher thinking levels consume more tokens
  4. Test performance: Compare response quality vs. latency for your use case

Example: Complex Reasoning Task

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Complex coding problem with high reasoning
const result = await neurolink.generate({
  input: {
    text: `
      Design an optimal algorithm to find the longest palindromic subsequence
      in a string. Explain your approach, prove its correctness, and analyze
      the time and space complexity.
    `,
  },
  provider: "vertex",
  model: "gemini-3-pro-preview",
  thinkingConfig: {
    thinkingLevel: "high",
  },
  maxTokens: 4000,
});

console.log(result.content);

Model Detection Utilities

NeuroLink provides utilities to check thinking support:

import {
  supportsThinkingConfig,
  getMaxThinkingBudgetTokens,
} from "@juspay/neurolink";

// Check if a model supports thinking
const supports = supportsThinkingConfig("gemini-3-pro-preview"); // true

// Get maximum budget for a model
const maxBudget = getMaxThinkingBudgetTokens("gemini-3-flash-preview"); // 50000

Important Notes

  • Provider compatibility: Thinking configuration is provider-specific. Gemini uses thinkingLevel, Claude uses budgetTokens
  • Token consumption: Extended thinking uses additional tokens beyond the response
  • Latency impact: Higher thinking levels increase response time
  • Not all models support thinking: Check supportsThinkingConfig() before enabling
  • Streaming support: Thinking configuration works with both generate() and stream()

See Also