Chapter 5: Vector Stores - Choosing and Configuring Storage Backends
March 2, 2026 ยท View on GitHub
Welcome to Chapter 5: Vector Stores - Choosing and Configuring Storage Backends. In this part of AnythingLLM Tutorial: Self-Hosted RAG and Agents Platform, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Select and configure vector databases for optimal semantic search performance.
Overview
Vector stores are the foundation of semantic search in AnythingLLM. This chapter covers the different vector database options, their configuration, and optimization strategies for different use cases.
Supported Vector Stores
LanceDB (Built-in)
# Default choice - works out of the box
# Excellent for getting started and small deployments
# Stores vectors locally in Docker volume
# Pros:
# - No additional setup required
# - Fast for small to medium datasets
# - Built-in to AnythingLLM
# - Good default similarity search
# Cons:
# - Not distributed (single instance)
# - Limited to local storage
# - May be slow for very large datasets
# Configuration (automatic):
# - Storage: /app/server/storage/lancedb
# - No additional settings needed
Chroma
# Self-hosted vector database
# Good for development and small teams
# Supports distributed deployment
# Installation:
docker run -d \
-p 8000:8000 \
--name chroma \
-v chroma-data:/chroma/chroma \
chromadb/chroma
# Configuration in AnythingLLM:
# Settings > Vector Database > Chroma
# - Host: http://host.docker.internal:8000
# - SSL: false (unless configured)
# Advanced settings:
# - Collection: anythingllm (default)
# - Chunk Size: 1000
# - Overlap: 200
# Docker Compose setup
version: '3.8'
services:
chroma:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- chroma_data:/chroma/chroma
environment:
- CHROMA_SERVER_HOST=0.0.0.0
- CHROMA_SERVER_HTTP_PORT=8000
volumes:
chroma_data:
Pinecone
# Cloud-native vector database
# Excellent for production and large-scale deployments
# Managed service with high availability
# Sign up at pinecone.io
# Create project and index
# Configuration in AnythingLLM:
# Settings > Vector Database > Pinecone
# - API Key: your-pinecone-api-key
# - Index Name: anythingllm-index
# - Environment: us-east1-gcp (or your region)
# - Project ID: your-project-id
# Index settings (create in Pinecone console):
# - Dimension: 1536 (for OpenAI embeddings) or 768 (for local)
# - Metric: cosine
# - Pod Type: p1 (starter) or s1 (production)
Weaviate
# Graph-based vector database
# Supports complex queries and relationships
# Good for knowledge graphs and structured data
# Installation:
docker run -d \
-p 8080:8080 \
--name weaviate \
-e QUERY_DEFAULTS_LIMIT=25 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
-e PERSISTENCE_DATA_PATH='/var/lib/weaviate' \
semitechnologies/weaviate:latest
# Configuration in AnythingLLM:
# Settings > Vector Database > Weaviate
# - Host: http://host.docker.internal:8080
# - API Key: (leave blank for anonymous)
# - Class Name: AnythingLLM
# Schema configuration:
# Weaviate will auto-create schema based on your data
Qdrant
# High-performance vector database
# Excellent for large-scale semantic search
# Supports distributed deployment
# Installation:
docker run -d \
-p 6333:6333 \
-p 6334:6334 \
-v qdrant_data:/qdrant/storage \
qdrant/qdrant
# Configuration in AnythingLLM:
# Settings > Vector Database > Qdrant
# - Host: http://host.docker.internal:6333
# - API Key: (optional)
# - Collection: anythingllm
# Collection settings:
# - Vector size: 1536 (matches embedding model)
# - Distance: Cosine
# - Replication factor: 1 (single node)
Vector Store Selection Guide
By Use Case
# Getting Started / Small Projects
Best: LanceDB (built-in)
Good: Chroma
Why: Zero configuration, works immediately
# Development / Small Teams
Best: Chroma
Good: LanceDB, Weaviate
Why: Easy setup, good performance, development-friendly
# Production / Enterprise
Best: Pinecone, Qdrant
Good: Weaviate
Why: Scalable, reliable, managed services
# Large Datasets (>1M vectors)
Best: Pinecone, Qdrant
Good: Weaviate (distributed)
Why: Optimized for scale, distributed architecture
# Complex Queries / Knowledge Graphs
Best: Weaviate
Good: Qdrant (with filtering)
Why: Graph capabilities, rich query language
# Cost Sensitive
Best: LanceDB, Chroma
Good: Self-hosted options
Why: No cloud costs, local storage
By Performance Characteristics
# Fastest Queries
- Qdrant: Optimized for speed
- Pinecone: Cloud performance
- Chroma: Good for development
# Highest Accuracy
- Pinecone: Advanced indexing
- Weaviate: Rich similarity measures
- Qdrant: Multiple distance metrics
# Best Scaling
- Pinecone: Serverless scaling
- Qdrant: Distributed clustering
- Weaviate: Horizontal scaling
# Lowest Latency
- Qdrant: In-memory operations
- Pinecone: Global CDN
- Chroma: Local deployment
By Cost
# Free / Self-hosted
- LanceDB: Completely free
- Chroma: Free, self-hosted
- Weaviate: Free community edition
- Qdrant: Free community edition
# Paid / Cloud
- Pinecone: \$0.10/GB/month + query costs
- Qdrant Cloud: \$0.05/GB/month
- Weaviate Cloud: Custom pricing
Configuration and Optimization
Embedding Model Compatibility
# Match vector dimensions to embedding model
# OpenAI text-embedding-3-small: 1536 dimensions
# OpenAI text-embedding-ada-002: 1536 dimensions
# OpenAI text-embedding-3-large: 3072 dimensions
# Local models (sentence-transformers):
# all-MiniLM-L6-v2: 384 dimensions
# all-mpnet-base-v2: 768 dimensions
# Anthropic (if using embeddings): 1024 dimensions
# Configuration example:
vector_store:
type: "pinecone"
dimensions: 1536 # Must match embedding model
metric: "cosine"
index_name: "anythingllm-docs"
Index Optimization
# Optimize indexes for your use case
# For accuracy (slower, more accurate):
index_optimization:
ef_construction: 200 # Higher = more accurate but slower builds
m: 16 # Higher = more accurate but more memory
# For speed (faster, slightly less accurate):
index_optimization:
ef_construction: 100
m: 8
# For memory efficiency:
index_optimization:
ef_construction: 128
m: 12
Chunking Strategy
# Different chunking for different content types
# Code files:
chunking:
strategy: semantic
size: 500
overlap: 50
separators: ["\nclass ", "\ndef ", "\n def "]
# Documentation:
chunking:
strategy: sentence
size: 1000
overlap: 200
separators: [". ", "! ", "? ", "\n\n"]
# Long documents (research papers):
chunking:
strategy: paragraph
size: 1500
overlap: 300
separators: ["\n\n", "\n"]
Performance Tuning
Query Optimization
# Optimize similarity search
# Search parameters:
similarity_search:
top_k: 5 # Number of results to return
score_threshold: 0.7 # Minimum similarity score
ef_search: 64 # Search parameter (higher = more accurate but slower)
# For speed (reduce ef_search):
similarity_search:
top_k: 5
ef_search: 32
# For accuracy (increase ef_search):
similarity_search:
top_k: 5
ef_search: 128
Caching Strategies
# Vector cache for frequently accessed documents
vector_cache:
enabled: true
size_mb: 512
ttl_hours: 24
# Query result cache
query_cache:
enabled: true
size_mb: 256
ttl_minutes: 60
# Embedding cache (avoid re-embedding unchanged content)
embedding_cache:
enabled: true
persist_to_disk: true
Batch Operations
# Optimize bulk operations
# Batch embedding:
embedding_batch:
size: 100
concurrency: 4
# Batch indexing:
indexing_batch:
size: 1000
concurrency: 2
# Bulk queries:
bulk_query:
max_queries: 10
concurrency: 3
Monitoring and Maintenance
Health Checks
# Check vector store health
curl http://localhost:3001/api/v1/system/health \
-H "Authorization: Bearer YOUR_API_KEY"
# Response includes vector store status:
{
"vector_store": {
"status": "healthy",
"connection": "connected",
"index_count": 5,
"vector_count": 125000
}
}
Performance Metrics
# Monitor vector store performance
curl http://localhost:3001/api/v1/analytics/vector-store \
-H "Authorization: Bearer YOUR_API_KEY"
# Response:
{
"query_performance": {
"avg_query_time_ms": 45,
"queries_per_second": 22,
"cache_hit_rate": 0.85
},
"storage_metrics": {
"total_vectors": 125000,
"index_size_mb": 450,
"memory_usage_mb": 1200
},
"error_rates": {
"connection_errors": 0.001,
"query_errors": 0.005
}
}
Maintenance Tasks
# Regular maintenance
# 1. Index optimization
curl -X POST http://localhost:3001/api/v1/admin/vector-store/optimize \
-H "Authorization: Bearer YOUR_API_KEY"
# 2. Cache cleanup
curl -X POST http://localhost:3001/api/v1/admin/cache/cleanup \
-H "Authorization: Bearer YOUR_API_KEY"
# 3. Backup vectors
curl -X GET http://localhost:3001/api/v1/admin/vector-store/backup \
-H "Authorization: Bearer YOUR_API_KEY" \
--output vector-backup.tar.gz
# 4. Rebuild indexes (if corrupted)
curl -X POST http://localhost:3001/api/v1/admin/vector-store/rebuild \
-H "Authorization: Bearer YOUR_API_KEY"
Backup and Recovery
Vector Store Backup
# Backup strategies by vector store type
# LanceDB (file-based):
docker exec anythingllm tar czf /tmp/lancedb-backup.tar.gz /app/server/storage/lancedb
docker cp anythingllm:/tmp/lancedb-backup.tar.gz ./backups/
# Chroma:
docker exec chroma tar czf /tmp/chroma-backup.tar.gz /chroma/chroma
docker cp chroma:/tmp/chroma-backup.tar.gz ./backups/
# Pinecone (cloud):
# Automatic replication, no manual backup needed
# Use API to export data if required
# Qdrant:
curl -X POST http://localhost:6333/collections/anythingllm/snapshots \
-H "Content-Type: application/json" \
-d '{}' > snapshot-info.json
Recovery Procedures
# Restore from backup
# LanceDB:
docker cp ./backups/lancedb-backup.tar.gz anythingllm:/tmp/
docker exec anythingllm tar xzf /tmp/lancedb-backup.tar.gz -C /
# Chroma:
docker cp ./backups/chroma-backup.tar.gz chroma:/tmp/
docker exec chroma tar xzf /tmp/chroma-backup.tar.gz -C /
# Trigger reindexing:
curl -X POST http://localhost:3001/api/v1/admin/vector-store/reindex \
-H "Authorization: Bearer YOUR_API_KEY"
Migration Between Vector Stores
Export/Import Process
# Step 1: Export from current store
curl -X GET http://localhost:3001/api/v1/admin/vector-store/export \
-H "Authorization: Bearer YOUR_API_KEY" \
--output vectors-export.json
# Step 2: Configure new vector store in UI
# Settings > Vector Database > [New Store]
# Step 3: Import to new store
curl -X POST http://localhost:3001/api/v1/admin/vector-store/import \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d @vectors-export.json
# Step 4: Verify import
curl http://localhost:3001/api/v1/system/health \
-H "Authorization: Bearer YOUR_API_KEY"
Zero-Downtime Migration
# Advanced migration with zero downtime
# 1. Set up new vector store alongside existing
# 2. Configure dual writing (write to both stores)
# 3. Gradually migrate read operations
# 4. Remove old store after full migration
dual_writing:
enabled: true
primary_store: "pinecone"
secondary_store: "qdrant"
read_from_primary: true
migration_progress: 0.0 # 0.0 to 1.0
# Gradual migration:
# - Start with 10% traffic to new store
# - Monitor performance and accuracy
# - Increase traffic gradually
# - Complete migration when confident
Advanced Features
Hybrid Search
# Combine semantic and keyword search
hybrid_search:
enabled: true
semantic_weight: 0.7
keyword_weight: 0.3
rerank_results: true
# Benefits:
# - Better precision for exact matches
# - Improved recall for related concepts
# - Handles both specific queries and general questions
Metadata Filtering
# Filter results by metadata
metadata_filters:
enabled: true
supported_fields:
- "document_type"
- "author"
- "date_created"
- "tags"
# Usage in queries:
# "Find API documentation created after 2024-01-01"
# Automatically filters by metadata before vector search
Multi-Index Support
# Multiple indexes for different content types
multi_index:
enabled: true
indexes:
- name: "code"
content_types: ["py", "js", "java"]
embedding_model: "code-search-embeddings"
- name: "docs"
content_types: ["md", "txt", "pdf"]
embedding_model: "text-embedding-ada-002"
- name: "data"
content_types: ["csv", "json"]
embedding_model: "text-embedding-3-small"
# Automatically route queries to appropriate index
# Improves relevance for specialized content
Troubleshooting
Common Issues
# Connection failed
# - Check network connectivity
# - Verify host/port settings
# - Check authentication credentials
curl -f http://localhost:8000/api/v1/heartbeat # Chroma health check
curl -f http://localhost:6333/health # Qdrant health check
# Slow queries
# - Increase ef_search parameter
# - Check index optimization
# - Monitor resource usage
# - Consider upgrading instance size
# Out of memory
# - Reduce batch sizes
# - Increase server memory
# - Optimize index parameters
# - Use disk-based storage for large datasets
# Index corruption
# - Rebuild index from documents
# - Restore from backup
# - Check disk space and permissions
Performance Debugging
# Enable detailed logging
export VECTOR_STORE_LOG_LEVEL=debug
# Monitor query performance
curl http://localhost:3001/api/v1/debug/query-performance \
-H "Authorization: Bearer YOUR_API_KEY"
# Analyze slow queries
# - Check embedding generation time
# - Monitor vector search time
# - Review result post-processing
Summary
In this chapter, we've covered:
- Vector Store Options: LanceDB, Chroma, Pinecone, Weaviate, Qdrant
- Selection Guide: Choosing the right store for your use case
- Configuration: Setting up and optimizing vector stores
- Performance Tuning: Query optimization, caching, and batching
- Monitoring: Health checks and performance metrics
- Maintenance: Backup, recovery, and migration
- Advanced Features: Hybrid search, metadata filtering, multi-index
- Troubleshooting: Common issues and debugging techniques
Key Takeaways
- Right Tool for the Job: Choose vector store based on scale, cost, and requirements
- Configuration Matters: Match dimensions and optimize for your use case
- Performance Tuning: Balance speed, accuracy, and resource usage
- Monitoring: Track health and performance regularly
- Backup Strategy: Implement regular backups and recovery procedures
- Scalability: Plan for growth and migration needs
- Hybrid Approaches: Combine multiple techniques for best results
Next Steps
Now that you understand vector stores, let's explore agents and how to add intelligent capabilities to your AnythingLLM instance.
Ready for Chapter 6? Agents
Generated for Awesome Code Docs
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for vector, chroma, http so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 5: Vector Stores - Choosing and Configuring Storage Backends as an operating subsystem inside AnythingLLM Tutorial: Self-Hosted RAG and Agents Platform, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around store, docker, curl as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 5: Vector Stores - Choosing and Configuring Storage Backends usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
vector. - Input normalization: shape incoming data so
chromareceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
http. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Walkthrough
Use the following upstream sources to verify implementation details while reading this chapter:
- AnythingLLM Repository
Why it matters: authoritative reference on
AnythingLLM Repository(github.com). - AnythingLLM Releases
Why it matters: authoritative reference on
AnythingLLM Releases(github.com). - AnythingLLM Docs
Why it matters: authoritative reference on
AnythingLLM Docs(docs.anythingllm.com). - AnythingLLM Website
Why it matters: authoritative reference on
AnythingLLM Website(anythingllm.com).
Suggested trace strategy:
- search upstream code for
vectorandchromato map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production