Foundry Local C# SDK

May 21, 2026 · View on GitHub

The Foundry Local C# SDK provides a .NET interface for running AI models locally via the Foundry Local Core. Discover, download, load, and run inference entirely on your own machine — no cloud required.

Features

  • Model catalog — browse and search all available models; filter by cached or loaded state
  • Lifecycle management — download, load, unload, and remove models programmatically
  • Chat completions — synchronous and IAsyncEnumerable streaming via OpenAI-compatible types
  • Embeddings — generate text embeddings via OpenAI-compatible API
  • Audio transcription — transcribe audio files with streaming support
  • Download progress — wire up an Action<float> callback for real-time download percentage
  • Model variants — select specific hardware/quantization variants per model alias
  • Optional web service — start an OpenAI-compatible REST endpoint (/v1/chat_completions, /v1/models)
  • WinML acceleration — opt-in Windows hardware acceleration with automatic EP download
  • Full async/await — every operation supports CancellationToken and async patterns
  • IDisposable — deterministic cleanup of native resources

Installation

dotnet add package Microsoft.AI.Foundry.Local

Building from source

cd sdk/cs
dotnet build src/Microsoft.AI.Foundry.Local.csproj

Or open Microsoft.AI.Foundry.Local.SDK.sln in Visual Studio / VS Code.

WinML: Automatic Hardware Acceleration (Windows)

On Windows, Foundry Local can leverage WinML for GPU/NPU hardware acceleration via ONNX Runtime execution providers (EPs). EPs are large binaries downloaded on first use and cached for subsequent runs.

Install the WinML package variant instead:

dotnet add package Microsoft.AI.Foundry.Local.WinML

Or build from source with:

dotnet build src/Microsoft.AI.Foundry.Local.csproj /p:UseWinML=true

Triggering EP download

EP management is explicit via two methods:

  • DiscoverEps() — returns an array of EpInfo describing each available EP and whether it is already registered.
  • DownloadAndRegisterEpsAsync(names?, progressCallback?, ct?) — downloads and registers the specified EPs (or all available EPs if no names are given). Returns an EpDownloadResult. Overloads are provided so you can pass just a callback without specifying names.
// Initialize the manager first (see Quick Start)
await FoundryLocalManager.CreateAsync(
    new Configuration { AppName = "my-app" },
    NullLogger.Instance);

var mgr = FoundryLocalManager.Instance;

// Discover what EPs are available
var eps = mgr.DiscoverEps();
foreach (var ep in eps)
{
    Console.WriteLine($"{ep.Name} — registered: {ep.IsRegistered}");
}

// Download and register all EPs
var result = await mgr.DownloadAndRegisterEpsAsync();
Console.WriteLine($"Success: {result.Success}, Status: {result.Status}");

// Or download only specific EPs
var result2 = await mgr.DownloadAndRegisterEpsAsync(new[] { eps[0].Name });

Per-EP download progress

Pass an optional Action<string, double> callback to receive (epName, percent) updates as each EP downloads (percent is 0–100):

string currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "")
        {
            Console.WriteLine();
        }
        currentEp = epName;
    }
    Console.Write($"\r  {epName}  {percent,6:F1}%");
});
Console.WriteLine();

Cancelling model and EP downloads

Pass a CancellationToken to either download API. Cancellation is observed on the next progress update.

// mgr and model already initialized
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));

await mgr.DownloadAndRegisterEpsAsync(ct: cts.Token);
await model.DownloadAsync(ct: cts.Token);

Catalog access no longer blocks on EP downloads. Call DownloadAndRegisterEpsAsync explicitly when you need hardware-accelerated execution providers.

Quick Start

using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;

// 1. Initialize the singleton manager
await FoundryLocalManager.CreateAsync(
    new Configuration { AppName = "my-app" },
    NullLogger.Instance);

// 2. Get the model catalog and look up a model
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
var model = await catalog.GetModelAsync("phi-3.5-mini")
    ?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");

// 3. Download (if needed) and load the model
await model.DownloadAsync();
await model.LoadAsync();

// 4. Get a chat client and run inference
var chatClient = await model.GetChatClientAsync();
var response = await chatClient.CompleteChatAsync(new[]
{
    new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
});

Console.WriteLine(response.Choices![0].Message.Content);

// 5. Clean up
FoundryLocalManager.Instance.Dispose();

Usage

Initialization

FoundryLocalManager is an async singleton. Call CreateAsync once at startup:

await FoundryLocalManager.CreateAsync(
    new Configuration { AppName = "my-app" },
    loggerFactory.CreateLogger("FoundryLocal"));

Access it anywhere afterward via FoundryLocalManager.Instance. Check FoundryLocalManager.IsInitialized to verify creation.

Catalog

The catalog lists all models known to the Foundry Local Core:

var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();

// List all available models
var models = await catalog.ListModelsAsync();
foreach (var m in models)
    Console.WriteLine($"{m.Alias} — {m.Info.DisplayName}");

// Get a specific model by alias
var model = await catalog.GetModelAsync("phi-3.5-mini")
    ?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");

// Get a specific variant by its unique model ID
var variant = await catalog.GetModelVariantAsync("phi-3.5-mini-generic-gpu-4")
    ?? throw new Exception("Variant 'phi-3.5-mini-generic-gpu-4' not found in catalog.");

// List models already downloaded to the local cache
var cached = await catalog.GetCachedModelsAsync();

// List models currently loaded in memory
var loaded = await catalog.GetLoadedModelsAsync();

Model Lifecycle

Each model may have multiple variants (different quantizations, hardware targets). The SDK auto-selects the best variant, or you can pick one. All models implement the IModel interface.

// Check and select variants
Console.WriteLine($"Selected: {model.Id}");
foreach (var v in model.Variants)
    Console.WriteLine($"  {v.Id} (cached: {await v.IsCachedAsync()})");

// Switch to a different variant
model.SelectVariant(model.Variants[1]);

Download, load, and unload:

// Download with progress reporting
await model.DownloadAsync(progress =>
    Console.WriteLine($"Download: {progress:F1}%"));

// Load into memory
await model.LoadAsync();

// Unload when done
await model.UnloadAsync();

// Remove from local cache entirely
await model.RemoveFromCacheAsync();

Chat Completions

var chatClient = await model.GetChatClientAsync();

var response = await chatClient.CompleteChatAsync(new[]
{
    new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
    new ChatMessage { Role = "user", Content = "Explain async/await in C#." }
});

Console.WriteLine(response.Choices![0].Message.Content);

Streaming

Use IAsyncEnumerable for token-by-token output:

using var cts = new CancellationTokenSource();

await foreach (var chunk in chatClient.CompleteChatStreamingAsync(
    new[] { new ChatMessage { Role = "user", Content = "Write a haiku about .NET" } }, cts.Token))
{
    Console.Write(chunk.Choices?[0]?.Message?.Content);
}

Chat Settings

Tune generation parameters per client:

chatClient.Settings.Temperature = 0.7f;
chatClient.Settings.MaxTokens = 256;
chatClient.Settings.TopP = 0.9f;
chatClient.Settings.FrequencyPenalty = 0.5f;

Embeddings

var embeddingClient = await model.GetEmbeddingClientAsync();

// Single input
var response = await embeddingClient.GenerateEmbeddingAsync("The quick brown fox jumps over the lazy dog");
var embedding = response.Data[0].Embedding; // List<double>
Console.WriteLine($"Dimensions: {embedding.Count}");

// Batch input
var batchResponse = await embeddingClient.GenerateEmbeddingsAsync([
    "The quick brown fox",
    "The capital of France is Paris"
]);
// batchResponse.Data[0].Embedding, batchResponse.Data[1].Embedding

Audio Transcription

var audioClient = await model.GetAudioClientAsync();

// One-shot transcription
var result = await audioClient.TranscribeAudioAsync("recording.mp3");
Console.WriteLine(result.Text);

// Streaming transcription
await foreach (var chunk in audioClient.TranscribeAudioStreamingAsync("recording.mp3", CancellationToken.None))
{
    Console.Write(chunk.Text);
}

Audio Settings

audioClient.Settings.Language = "en";
audioClient.Settings.Temperature = 0.0f;

Live Audio Transcription (Real-Time Streaming)

For real-time microphone-to-text transcription, use CreateLiveTranscriptionSession(). Audio is pushed as raw PCM chunks and transcription results stream back as an IAsyncEnumerable.

The streaming result type (LiveAudioTranscriptionResponse) extends ConversationItem from the Betalgo OpenAI SDK's Realtime models, so it's compatible with the OpenAI Realtime API pattern. Access transcribed text via result.Content[0].Text or result.Content[0].Transcript.

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();

// Configure audio format (must be set before StartAsync)
session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio from a microphone callback (thread-safe)
waveIn.DataAvailable += (sender, e) =>
{
    _ = session.AppendAsync(new ReadOnlyMemory<byte>(e.Buffer, 0, e.BytesRecorded));
};

// Read transcription results as they arrive
await foreach (var result in session.GetStream())
{
    // result follows the OpenAI Realtime ConversationItem pattern:
    // - result.Content[0].Text       — incremental transcribed text (per chunk, not accumulated)
    // - result.Content[0].Transcript — alias for Text (OpenAI Realtime compatibility)
    // - result.IsFinal               — true for final results, false for interim hypotheses
    // - result.StartTime / EndTime   — segment timing in seconds
    Console.Write(result.Content?[0]?.Text);
}

await session.StopAsync();

Output Type

FieldTypeDescription
ContentList<TranscriptionContentPart>Content parts. Access text via Content[0].Text or Content[0].Transcript.
IsFinalboolWhether this is a final or interim result. Nemotron always returns true.
StartTimedouble?Start time offset in the audio stream (seconds).
EndTimedouble?End time offset in the audio stream (seconds).
Idstring?Unique identifier for this result (if available).

Session Lifecycle

MethodDescription
StartAsync()Initialize the streaming session. Settings are frozen after this call.
AppendAsync(pcmData)Push a chunk of raw PCM audio. Thread-safe (bounded internal queue).
GetStream()Async enumerable of transcription results.
StopAsync()Signal end-of-audio, flush remaining audio, and clean up.
DisposeAsync()Calls StopAsync if needed. Use await using for automatic cleanup.

Web Service

Start an OpenAI-compatible REST endpoint for use by external tools or processes:

// Configure the web service URL in your Configuration
await FoundryLocalManager.CreateAsync(
    new Configuration
    {
        AppName = "my-app",
        Web = new Configuration.WebService { Urls = "http://127.0.0.1:5000" }
    },
    NullLogger.Instance);

await FoundryLocalManager.Instance.StartWebServiceAsync();
Console.WriteLine($"Listening on: {string.Join(", ", FoundryLocalManager.Instance.Urls!)}");

// ... use the service ...

await FoundryLocalManager.Instance.StopWebServiceAsync();

Configuration

PropertyTypeDefaultDescription
AppNamestring(required)Your application name
AppDataDirstring?~/.{AppName}Application data directory
ModelCacheDirstring?{AppDataDir}/cache/modelsWhere models are stored locally
LogsDirstring?{AppDataDir}/logsLog output directory
LogLevelLogLevelWarningVerbose, Debug, Information, Warning, Error, Fatal
WebWebService?nullWeb service configuration (see below)
AdditionalSettingsIDictionary<string, string>?nullExtra key-value settings passed to Core

Configuration.WebService

PropertyTypeDefaultDescription
Urlsstring?127.0.0.1:0Bind address; semi-colon separated for multiple
ExternalUrlUri?nullURI for accessing the web service in a separate process

Disposal

FoundryLocalManager implements IDisposable. Dispose stops the web service (if running) and releases native resources:

FoundryLocalManager.Instance.Dispose();

API Reference

Auto-generated API docs live in docs/api/. See GENERATE-DOCS.md to regenerate.

Key types:

TypeDescription
FoundryLocalManagerSingleton entry point — create, catalog, web service
ConfigurationInitialization settings
ICatalogModel catalog interface
IModelModel interface — identity, metadata, lifecycle, variant selection
ModelModel with variant selection (implements IModel)
OpenAIChatClientChat completions (sync + streaming)
OpenAIAudioClientAudio transcription (sync + streaming)
LiveAudioTranscriptionSessionReal-time audio streaming session
LiveAudioTranscriptionResponseStreaming transcription result (ConversationItem-shaped)
ModelInfoFull model metadata record

Tests

dotnet test

See test/FoundryLocal.Tests/LOCAL_MODEL_TESTING.md for prerequisites and local model setup.