Development Guide

September 10, 2025 · View on GitHub

This guide covers everything you need to know for developing with Fluid Server, from initial setup to building and testing.

Prerequisites

System Requirements

OS: Windows 10/11
Python: 3.10+ with uv package manager
Memory: 8GB+ RAM (16GB recommended for 8B models)
Storage: 10GB+ free space for models

Hardware-Specific Requirements

Intel NPU Support

Runtime: OpenVINO 2025.2.0+ runtime
Hardware: Intel Arc graphics or Intel NPU

Qualcomm NPU Support

Runtime: ONNX Runtime QNN (bundled with dependencies)
Hardware: Snapdragon X Elite device with HTP (Hexagon Tensor Processor)

Installing Prerequisites

Install uv Package Manager

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.ps1 | powershell

Install OpenVINO (for Intel Devices)

# Download and install OpenVINO runtime from Intel's website
# https://docs.openvino.ai/2025/get-started/install-openvino.html

Initial Setup

1. Clone the Repository

git clone https://github.com/FluidInference/fluid-server.git
cd fluid-server

2. Install Dependencies

# Install all dependencies including development tools
uv sync

# Install development dependencies separately (if needed)
uv add --dev ty

3. Verify Installation

# Check that dependencies are installed correctly
uv run python -c "import fluid_server; print('Setup successful')"

Development Workflow

Running the Development Server

Basic Development Mode

# Run with auto-reload for development
uv run python -m fluid_server --reload

Development with Custom Options

# Run with custom model path and debug logging
uv run python -m fluid_server --model-path ./models --log-level DEBUG --reload

Using Convenience Scripts

# Use the convenience script
.\scripts\start_server.ps1

Development Server Options

--reload - Auto-reload on code changes (development only)
--model-path - Custom path to model directory
--log-level - Set logging level (DEBUG, INFO, WARNING, ERROR)
--host - Server host (default: 127.0.0.1)
--port - Server port (default: 8080)

Code Quality and Testing

Type Checking

# Run type checking with ty
.\scripts\typecheck.ps1

# Or run directly
uv run ty

Code Formatting and Linting

# Format code with ruff
uv run ruff format .

# Check for linting issues
uv run ruff check .

# Fix auto-fixable linting issues
uv run ruff check --fix .

Testing the Server

Development Testing

# Test health endpoint
curl http://localhost:8080/health

# Test models endpoint
curl http://localhost:8080/v1/models

# Test basic chat completion
curl -X POST http://localhost:8080/v1/chat/completions `
  -H "Content-Type: application/json" `
  -d '{\"model\": \"current\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'

Automated Testing Scripts

# Test with actual models (requires model downloads)
.\scripts\test_with_models.ps1

# Kill server if needed
.\scripts\kill_server.ps1

Building and Distribution

Building the Executable

Standard Build

# Build standalone .exe with PyInstaller
.\scripts\build.ps1

The build script creates dist/fluid-server.exe (approximately 276 MB with OpenVINO + QNN bundled).

Build Configuration

The build process:

Installs dependencies with uv sync
Runs type checking with ty
Creates executable with PyInstaller
Includes all necessary runtime libraries

Testing the Built Executable

Quick Test

# Test the built executable
.\scripts\test_exe.ps1

Manual Testing

# Run the executable directly
.\dist\fluid-server.exe

# Run with custom options
.\dist\fluid-server.exe --host 127.0.0.1 --port 8080 --log-level DEBUG

Project Structure

Source Code Organization

src/fluid_server/
├── __main__.py              # CLI entry point
├── app.py                   # FastAPI application factory
├── config.py                # Server configuration
├── api/                     # API endpoints
│   ├── v1/
│   │   ├── chat.py         # Chat completions
│   │   ├── audio.py        # Audio transcription
│   │   ├── models.py       # Model management
│   │   └── embeddings.py   # Text embeddings
│   └── health.py           # Health checks
├── managers/               # Core business logic
│   ├── runtime_manager.py  # Model loading/unloading
│   └── embedding_manager.py # Embedding generation
├── runtimes/              # Model runtime implementations
│   ├── base.py            # Abstract base runtime
│   ├── openvino_llm.py    # OpenVINO LLM runtime
│   ├── openvino_whisper.py # OpenVINO Whisper runtime
│   ├── llamacpp.py        # Llama.cpp runtime
│   └── qnn_whisper.py     # QNN Whisper runtime
├── storage/               # Data persistence
│   └── lancedb_client.py  # LanceDB vector storage
└── utils/                 # Utilities
    ├── model_utils.py     # Model discovery/downloading
    └── platform_utils.py  # Platform detection

Model Directory Structure

models/
├── llm/                   # Language models
│   ├── qwen3-8b-int8-ov/  # OpenVINO LLM models
│   └── phi-4-mini/        # Additional LLM models
├── whisper/               # Audio transcription models
│   ├── whisper-large-v3-turbo-ov-npu/    # OpenVINO Whisper
│   ├── whisper-large-v3-turbo-qnn/       # QNN Whisper
│   └── whisper-tiny/                     # Smaller models
├── embeddings/            # Text embedding models
│   └── sentence-transformers_all-MiniLM-L6-v2/
└── cache/                 # Compiled model cache

Development Configuration

Environment Variables

# Set development environment variables
$env:PYTHONPATH = "src"
$env:FLUID_LOG_LEVEL = "DEBUG"
$env:FLUID_MODEL_PATH = "./models"

IDE Configuration

VS Code Settings

Create .vscode/settings.json:

{
    "python.defaultInterpreterPath": ".venv/Scripts/python.exe",
    "python.linting.enabled": true,
    "python.linting.pylintEnabled": false,
    "python.linting.flake8Enabled": false,
    "python.formatting.provider": "black",
    "python.formatting.blackPath": ".venv/Scripts/black.exe"
}

Debugging and Troubleshooting

Common Development Issues

Module Import Errors

# Ensure PYTHONPATH includes src directory
$env:PYTHONPATH = "src"
uv run python -m fluid_server

Model Loading Issues

# Run with debug logging to see model loading details
uv run python -m fluid_server --log-level DEBUG

Port Already in Use

# Kill any existing server processes
.\scripts\kill_server.ps1

# Or use a different port
uv run python -m fluid_server --port 8081

Debug Logging

import logging

# Enable debug logging for specific components
logging.getLogger("fluid_server.managers.runtime_manager").setLevel(logging.DEBUG)
logging.getLogger("fluid_server.runtimes").setLevel(logging.DEBUG)

Performance Profiling

# Run with performance profiling
uv run python -m cProfile -o profile_output.pstats -m fluid_server

# Analyze profile results
uv run python -c "import pstats; pstats.Stats('profile_output.pstats').sort_stats('cumulative').print_stats(20)"

Contributing Guidelines

Code Style

Follow PEP 8 style guidelines
Use type hints for all function parameters and return values
Format code with ruff format
Ensure all linting checks pass with ruff check

Commit Guidelines

Use conventional commit messages
Include relevant tests for new features
Ensure all existing tests pass
Update documentation for API changes

Pull Request Process

Fork the repository
Create a feature branch from main
Make your changes with proper testing
Ensure all checks pass (linting, type checking, tests)
Submit a pull request with a clear description

Advanced Development Topics

Adding New Model Runtimes

Implement the BaseRuntime abstract class
Add the runtime to the RuntimeManager
Update configuration options
Add appropriate tests

Extending API Endpoints

Create new endpoint modules in api/v1/
Register routes in the main application
Add request/response models using Pydantic
Include comprehensive error handling

Performance Optimization

Use async/await for I/O operations
Implement connection pooling for external services
Cache frequently accessed data
Monitor memory usage with large models

This development guide provides the foundation for contributing to and extending Fluid Server. For specific implementation details, refer to the existing codebase and follow the established patterns.