Development Guide
September 10, 2025 · View on GitHub
This guide covers everything you need to know for developing with Fluid Server, from initial setup to building and testing.
Prerequisites
System Requirements
- OS: Windows 10/11
- Python: 3.10+ with
uvpackage manager - Memory: 8GB+ RAM (16GB recommended for 8B models)
- Storage: 10GB+ free space for models
Hardware-Specific Requirements
Intel NPU Support
- Runtime: OpenVINO 2025.2.0+ runtime
- Hardware: Intel Arc graphics or Intel NPU
Qualcomm NPU Support
- Runtime: ONNX Runtime QNN (bundled with dependencies)
- Hardware: Snapdragon X Elite device with HTP (Hexagon Tensor Processor)
Installing Prerequisites
Install uv Package Manager
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.ps1 | powershell
Install OpenVINO (for Intel Devices)
# Download and install OpenVINO runtime from Intel's website
# https://docs.openvino.ai/2025/get-started/install-openvino.html
Initial Setup
1. Clone the Repository
git clone https://github.com/FluidInference/fluid-server.git
cd fluid-server
2. Install Dependencies
# Install all dependencies including development tools
uv sync
# Install development dependencies separately (if needed)
uv add --dev ty
3. Verify Installation
# Check that dependencies are installed correctly
uv run python -c "import fluid_server; print('Setup successful')"
Development Workflow
Running the Development Server
Basic Development Mode
# Run with auto-reload for development
uv run python -m fluid_server --reload
Development with Custom Options
# Run with custom model path and debug logging
uv run python -m fluid_server --model-path ./models --log-level DEBUG --reload
Using Convenience Scripts
# Use the convenience script
.\scripts\start_server.ps1
Development Server Options
--reload- Auto-reload on code changes (development only)--model-path- Custom path to model directory--log-level- Set logging level (DEBUG, INFO, WARNING, ERROR)--host- Server host (default: 127.0.0.1)--port- Server port (default: 8080)
Code Quality and Testing
Type Checking
# Run type checking with ty
.\scripts\typecheck.ps1
# Or run directly
uv run ty
Code Formatting and Linting
# Format code with ruff
uv run ruff format .
# Check for linting issues
uv run ruff check .
# Fix auto-fixable linting issues
uv run ruff check --fix .
Testing the Server
Development Testing
# Test health endpoint
curl http://localhost:8080/health
# Test models endpoint
curl http://localhost:8080/v1/models
# Test basic chat completion
curl -X POST http://localhost:8080/v1/chat/completions `
-H "Content-Type: application/json" `
-d '{\"model\": \"current\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'
Automated Testing Scripts
# Test with actual models (requires model downloads)
.\scripts\test_with_models.ps1
# Kill server if needed
.\scripts\kill_server.ps1
Building and Distribution
Building the Executable
Standard Build
# Build standalone .exe with PyInstaller
.\scripts\build.ps1
The build script creates dist/fluid-server.exe (approximately 276 MB with OpenVINO + QNN bundled).
Build Configuration
The build process:
- Installs dependencies with
uv sync - Runs type checking with
ty - Creates executable with PyInstaller
- Includes all necessary runtime libraries
Testing the Built Executable
Quick Test
# Test the built executable
.\scripts\test_exe.ps1
Manual Testing
# Run the executable directly
.\dist\fluid-server.exe
# Run with custom options
.\dist\fluid-server.exe --host 127.0.0.1 --port 8080 --log-level DEBUG
Project Structure
Source Code Organization
src/fluid_server/
├── __main__.py # CLI entry point
├── app.py # FastAPI application factory
├── config.py # Server configuration
├── api/ # API endpoints
│ ├── v1/
│ │ ├── chat.py # Chat completions
│ │ ├── audio.py # Audio transcription
│ │ ├── models.py # Model management
│ │ └── embeddings.py # Text embeddings
│ └── health.py # Health checks
├── managers/ # Core business logic
│ ├── runtime_manager.py # Model loading/unloading
│ └── embedding_manager.py # Embedding generation
├── runtimes/ # Model runtime implementations
│ ├── base.py # Abstract base runtime
│ ├── openvino_llm.py # OpenVINO LLM runtime
│ ├── openvino_whisper.py # OpenVINO Whisper runtime
│ ├── llamacpp.py # Llama.cpp runtime
│ └── qnn_whisper.py # QNN Whisper runtime
├── storage/ # Data persistence
│ └── lancedb_client.py # LanceDB vector storage
└── utils/ # Utilities
├── model_utils.py # Model discovery/downloading
└── platform_utils.py # Platform detection
Model Directory Structure
models/
├── llm/ # Language models
│ ├── qwen3-8b-int8-ov/ # OpenVINO LLM models
│ └── phi-4-mini/ # Additional LLM models
├── whisper/ # Audio transcription models
│ ├── whisper-large-v3-turbo-ov-npu/ # OpenVINO Whisper
│ ├── whisper-large-v3-turbo-qnn/ # QNN Whisper
│ └── whisper-tiny/ # Smaller models
├── embeddings/ # Text embedding models
│ └── sentence-transformers_all-MiniLM-L6-v2/
└── cache/ # Compiled model cache
Development Configuration
Environment Variables
# Set development environment variables
$env:PYTHONPATH = "src"
$env:FLUID_LOG_LEVEL = "DEBUG"
$env:FLUID_MODEL_PATH = "./models"
IDE Configuration
VS Code Settings
Create .vscode/settings.json:
{
"python.defaultInterpreterPath": ".venv/Scripts/python.exe",
"python.linting.enabled": true,
"python.linting.pylintEnabled": false,
"python.linting.flake8Enabled": false,
"python.formatting.provider": "black",
"python.formatting.blackPath": ".venv/Scripts/black.exe"
}
Debugging and Troubleshooting
Common Development Issues
Module Import Errors
# Ensure PYTHONPATH includes src directory
$env:PYTHONPATH = "src"
uv run python -m fluid_server
Model Loading Issues
# Run with debug logging to see model loading details
uv run python -m fluid_server --log-level DEBUG
Port Already in Use
# Kill any existing server processes
.\scripts\kill_server.ps1
# Or use a different port
uv run python -m fluid_server --port 8081
Debug Logging
import logging
# Enable debug logging for specific components
logging.getLogger("fluid_server.managers.runtime_manager").setLevel(logging.DEBUG)
logging.getLogger("fluid_server.runtimes").setLevel(logging.DEBUG)
Performance Profiling
# Run with performance profiling
uv run python -m cProfile -o profile_output.pstats -m fluid_server
# Analyze profile results
uv run python -c "import pstats; pstats.Stats('profile_output.pstats').sort_stats('cumulative').print_stats(20)"
Contributing Guidelines
Code Style
- Follow PEP 8 style guidelines
- Use type hints for all function parameters and return values
- Format code with
ruff format - Ensure all linting checks pass with
ruff check
Commit Guidelines
- Use conventional commit messages
- Include relevant tests for new features
- Ensure all existing tests pass
- Update documentation for API changes
Pull Request Process
- Fork the repository
- Create a feature branch from
main - Make your changes with proper testing
- Ensure all checks pass (linting, type checking, tests)
- Submit a pull request with a clear description
Advanced Development Topics
Adding New Model Runtimes
- Implement the
BaseRuntimeabstract class - Add the runtime to the
RuntimeManager - Update configuration options
- Add appropriate tests
Extending API Endpoints
- Create new endpoint modules in
api/v1/ - Register routes in the main application
- Add request/response models using Pydantic
- Include comprehensive error handling
Performance Optimization
- Use async/await for I/O operations
- Implement connection pooling for external services
- Cache frequently accessed data
- Monitor memory usage with large models
This development guide provides the foundation for contributing to and extending Fluid Server. For specific implementation details, refer to the existing codebase and follow the established patterns.