LanceDB Integration Guide for Fluid Server

September 10, 2025 · View on GitHub

Overview

Fluid Server integrates LanceDB as its primary vector database solution for storing and retrieving high-dimensional embeddings. LanceDB provides a modern, embedded vector database specifically designed for AI applications with native multimodal support and superior performance characteristics.

Why LanceDB Over Chroma?

1. Native .NET Client Support

LanceDB offers comprehensive client libraries including full .NET support, making it ideal for Windows desktop applications that need to integrate with C# and .NET frameworks. Chroma only provides a client-side solution for .NET environments.

2. Native Multimodal Embeddings

LanceDB supports multimodal embeddings (text, image, audio) natively without requiring additional configuration or separate collections. This allows unified storage and cross-modal search capabilities.

3. Superior Performance

Embedded Architecture: LanceDB runs as an embedded solution with lower latency and no network overhead
Columnar Storage: Uses Apache Arrow and Lance format for efficient storage and retrieval
Optimized Indexing: Advanced indexing algorithms specifically designed for high-dimensional vectors

Fluid Server Architecture
├── API Layer (FastAPI)
│   ├── /v1/embeddings          # OpenAI-compatible embeddings
│   ├── /v1/embeddings/multimodal # Multimodal embedding support
│   └── /v1/vector_store/*      # Vector storage operations
├── Embedding Manager
│   ├── Text Embeddings (OpenVINO)
│   ├── Image Embeddings (CLIP-based)
│   └── Audio Embeddings (Whisper-based)
└── LanceDB Storage Layer
    ├── Collections (Tables)
    ├── Vector Search Engine
    └── Document Storage

Model Directory Structure

models/
├── embeddings/
│   ├── sentence-transformers_all-MiniLM-L6-v2/  # Text models
│   ├── openai_clip-vit-base-patch32/             # Multimodal models
│   └── openai_whisper-base/                      # Audio models
└── cache/                                        # Compiled model cache

Installation and Configuration

Dependencies

LanceDB is automatically installed with Fluid Server:

# pyproject.toml
dependencies = [
    "lancedb>=0.14.0",
    "sentence-transformers>=2.2.0",
    "pillow>=10.0.0",
]

Configuration

Enable embeddings in your server configuration:

# Server startup
config = ServerConfig(
    enable_embeddings=True,
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
    multimodal_model="openai/clip-vit-base-patch32",
    embedding_device="CPU",  # or "GPU"
    embeddings_db_path=Path("./data/embeddings"),
    embeddings_db_name="vectors"
)

curl -X POST "http://localhost:8080/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world", "Machine learning with Python"],
    "model": "sentence-transformers/all-MiniLM-L6-v2"
  }'

Store Documents with Automatic Embedding

curl -X POST "http://localhost:8080/v1/vector_store/insert" \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "documents", 
    "documents": [
      {
        "content": "LanceDB provides efficient vector storage",
        "metadata": {"source": "documentation", "category": "database"}
      },
      {
        "content": "Fluid Server enables AI model deployment on Windows",
        "metadata": {"source": "readme", "category": "deployment"}
      }
    ],
    "model": "sentence-transformers/all-MiniLM-L6-v2"
  }'

2. Multimodal Embeddings

Image Embeddings

curl -X POST "http://localhost:8080/v1/embeddings/multimodal" \
  -F "input_type=image" \
  -F "model=openai/clip-vit-base-patch32" \
  -F "file=@image.jpg"

3. Vector Search

Text-based Search

curl -X POST "http://localhost:8080/v1/vector_store/search" \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "documents",
    "query": "vector database performance",
    "query_type": "text",
    "limit": 5,
    "model": "sentence-transformers/all-MiniLM-L6-v2"
  }'

curl -X POST "http://localhost:8080/v1/vector_store/search/multimodal" \
  -F "collection=documents" \
  -F "query_type=image" \
  -F "limit=10" \
  -F "model=openai/clip-vit-base-patch32" \
  -F "file_query=@query_image.jpg"

Collection Management

Create Collections

curl -X POST "http://localhost:8080/v1/vector_store/collections" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my_collection",
    "dimension": 384,
    "content_type": "text",
    "overwrite": false
  }'

List Collections

curl -X GET "http://localhost:8080/v1/vector_store/collections"

Get Collection Statistics

curl -X GET "http://localhost:8080/v1/vector_store/my_collection/stats"

# Filter by metadata
results = await lancedb_client.search_vectors(
    collection_name="documents",
    query_vector=query_vector,
    limit=10,
    filter_condition="metadata->>'category' = 'technical'"
)

2. Batch Operations

# Batch insert
documents = [VectorDocument(...) for _ in range(1000)]
await lancedb_client.insert_documents("large_collection", documents)

# Batch embedding generation
texts = ["text " + str(i) for i in range(100)]
embeddings = await embedding_manager.get_text_embeddings(texts)

3. Memory Management

The server automatically manages embedding model memory:

# Models are automatically loaded/unloaded based on usage
config = ServerConfig(
    idle_timeout_minutes=30,  # Unload models after 30 minutes of inactivity
    max_memory_gb=8.0         # Maximum memory usage
)

Debug Commands

# Check available models
curl -X GET "http://localhost:8080/v1/embeddings/models"

# Verify collection status
curl -X GET "http://localhost:8080/v1/vector_store/collections"