Chapter 6: Vector Store Abstraction
March 21, 2026 ยท View on GitHub
Welcome to Chapter 6: Vector Store Abstraction. In this part of LangChain Architecture: Internal Design Deep Dive, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Vector stores are where semantic search happens. This chapter examines the VectorStore interface, how embedding models are abstracted, the Retriever pattern that connects vector stores to LCEL chains, and how LangChain achieves backend portability across dozens of vector databases.
The Embedding Model Contract
Before we can store vectors, we need to produce them. LangChain defines the Embeddings abstract base class:
classDiagram
class Embeddings {
<<abstract>>
+embed_documents(texts: List~str~) List~List~float~~
+embed_query(query: str) List~float~
+aembed_documents(texts: List~str~) List~List~float~~
+aembed_query(query: str) List~float~
}
class OpenAIEmbeddings {
+model: str
+dimensions: Optional~int~
+embed_documents(texts) List~List~float~~
+embed_query(query) List~float~
}
class HuggingFaceEmbeddings {
+model_name: str
+model_kwargs: dict
+embed_documents(texts) List~List~float~~
+embed_query(query) List~float~
}
class FakeEmbeddings {
+size: int
+embed_documents(texts) List~List~float~~
+embed_query(query) List~float~
}
Embeddings <|-- OpenAIEmbeddings
Embeddings <|-- HuggingFaceEmbeddings
Embeddings <|-- FakeEmbeddings
The key design decision here is the separation of embed_documents and embed_query. While many embedding models treat them identically, some models (e.g., certain asymmetric models) use different encoding strategies for documents vs queries:
from langchain_core.embeddings import Embeddings
class Embeddings(ABC):
"""Abstract base class for embedding models."""
@abstractmethod
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed a list of documents. Optimized for batch processing."""
@abstractmethod
def embed_query(self, text: str) -> List[float]:
"""Embed a single query. May use different encoding than documents."""
async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
"""Default: runs embed_documents in executor."""
return await asyncio.get_event_loop().run_in_executor(
None, self.embed_documents, texts
)
async def aembed_query(self, text: str) -> List[float]:
"""Default: runs embed_query in executor."""
return await asyncio.get_event_loop().run_in_executor(
None, self.embed_query, text
)
Embedding Model Usage
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Embed multiple documents (batched for efficiency)
doc_vectors = embeddings.embed_documents([
"LangChain is a framework for building LLM apps.",
"Vector stores enable semantic search.",
"Embeddings convert text to numbers."
])
# Embed a query (may use different encoding)
query_vector = embeddings.embed_query("What is LangChain?")
print(f"Document vector dimensions: {len(doc_vectors[0])}") # 1536
print(f"Query vector dimensions: {len(query_vector)}") # 1536
The VectorStore Interface
VectorStore is the abstract base class for all vector databases. It defines a comprehensive interface for storing, searching, and deleting vectors:
classDiagram
class VectorStore {
<<abstract>>
+embeddings: Embeddings
+add_texts(texts, metadatas?) List~str~
+add_documents(documents) List~str~
+delete(ids?) bool
+similarity_search(query, k) List~Document~
+similarity_search_with_score(query, k) List~Tuple~
+similarity_search_by_vector(vector, k) List~Document~
+max_marginal_relevance_search(query, k, fetch_k) List~Document~
+as_retriever(**kwargs) VectorStoreRetriever
+from_texts(texts, embedding, metadatas?) VectorStore$
+from_documents(documents, embedding) VectorStore$
}
class Chroma {
-_collection: chromadb.Collection
+similarity_search(query, k) List~Document~
}
class FAISS {
-index: faiss.Index
-docstore: Docstore
+similarity_search(query, k) List~Document~
}
class Pinecone {
-_index: pinecone.Index
+similarity_search(query, k) List~Document~
}
class PGVector {
-_engine: SQLAlchemy Engine
+similarity_search(query, k) List~Document~
}
VectorStore <|-- Chroma
VectorStore <|-- FAISS
VectorStore <|-- Pinecone
VectorStore <|-- PGVector
Core Methods
The interface is divided into write and read operations:
class VectorStore(ABC):
"""Abstract interface for vector stores."""
@abstractmethod
def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
**kwargs
) -> List[str]:
"""Add texts to the store. Returns list of IDs."""
def add_documents(self, documents: List[Document], **kwargs) -> List[str]:
"""Convenience: extracts texts and metadatas from Documents."""
texts = [doc.page_content for doc in documents]
metadatas = [doc.metadata for doc in documents]
return self.add_texts(texts, metadatas, **kwargs)
@abstractmethod
def similarity_search(
self,
query: str,
k: int = 4,
**kwargs
) -> List[Document]:
"""Return k most similar documents to query."""
def similarity_search_with_score(
self,
query: str,
k: int = 4,
**kwargs
) -> List[Tuple[Document, float]]:
"""Return documents with similarity scores."""
raise NotImplementedError
def similarity_search_by_vector(
self,
embedding: List[float],
k: int = 4,
**kwargs
) -> List[Document]:
"""Search by pre-computed vector (skips embed_query)."""
raise NotImplementedError
The Internal Search Flow
When you call similarity_search(query), here is what happens inside a typical implementation:
sequenceDiagram
participant User
participant VectorStore
participant Embeddings
participant Backend as DB Backend
User->>VectorStore: similarity_search("What is LangChain?", k=4)
VectorStore->>Embeddings: embed_query("What is LangChain?")
Embeddings-->>VectorStore: [0.023, -0.041, 0.089, ...]
VectorStore->>Backend: nearest_neighbor_search(vector, k=4)
Backend-->>VectorStore: [(id1, score1), (id2, score2), ...]
VectorStore->>Backend: fetch_documents([id1, id2, ...])
Backend-->>VectorStore: [(text1, metadata1), ...]
VectorStore-->>User: [Document(page_content=text1, metadata=metadata1), ...]
# Typical implementation pattern (e.g., Chroma)
class Chroma(VectorStore):
def similarity_search(self, query: str, k: int = 4, **kwargs) -> List[Document]:
# Step 1: Embed the query
query_embedding = self._embedding_function.embed_query(query)
# Step 2: Query the backend
results = self._collection.query(
query_embeddings=[query_embedding],
n_results=k,
**kwargs
)
# Step 3: Convert to Documents
documents = []
for i in range(len(results["ids"][0])):
doc = Document(
page_content=results["documents"][0][i],
metadata=results["metadatas"][0][i]
)
documents.append(doc)
return documents
Maximum Marginal Relevance (MMR)
MMR is a search strategy that balances relevance with diversity. Instead of returning the k most similar documents (which may be very similar to each other), MMR iteratively selects documents that are relevant to the query but dissimilar to already-selected documents:
def max_marginal_relevance_search(
self,
query: str,
k: int = 4,
fetch_k: int = 20, # Fetch more candidates than needed
lambda_mult: float = 0.5, # Balance: 1.0 = pure relevance, 0.0 = pure diversity
**kwargs
) -> List[Document]:
"""MMR search for diverse results."""
# Step 1: Get query embedding
query_embedding = self.embeddings.embed_query(query)
# Step 2: Fetch more candidates than we need
candidates = self.similarity_search_by_vector(
query_embedding, k=fetch_k
)
# Step 3: Apply MMR algorithm
selected = maximal_marginal_relevance(
query_embedding=query_embedding,
embedding_list=[embed(c) for c in candidates],
lambda_mult=lambda_mult,
k=k
)
return [candidates[i] for i in selected]
flowchart TD
Query["Query Vector"] --> Fetch["Fetch top 20\ncandidates"]
Fetch --> MMR["MMR Algorithm"]
MMR --> S1["Select most\nrelevant doc"]
S1 --> S2["Select doc most relevant\nto query BUT dissimilar\nto selected docs"]
S2 --> S3["Repeat until\nk docs selected"]
S3 --> Output["4 diverse,\nrelevant docs"]
classDef algo fill:#f3e5f5,stroke:#4a148c
classDef io fill:#e1f5fe,stroke:#01579b
class MMR,S1,S2,S3 algo
class Query,Fetch,Output io
Factory Methods: from_texts and from_documents
VectorStore provides convenient class methods for creating and populating a store in one step:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
# From raw texts
vectorstore = Chroma.from_texts(
texts=["Text 1", "Text 2", "Text 3"],
embedding=OpenAIEmbeddings(),
metadatas=[{"source": "a"}, {"source": "b"}, {"source": "c"}],
collection_name="my_collection",
persist_directory="./chroma_db"
)
# From Document objects (more common in practice)
vectorstore = Chroma.from_documents(
documents=chunks, # Output from TextSplitter
embedding=OpenAIEmbeddings(),
collection_name="my_collection"
)
The Retriever Pattern
Retrievers are the bridge between vector stores and LCEL chains. The BaseRetriever abstract class implements the Runnable interface:
classDiagram
class Runnable~Input, Output~ {
<<interface>>
+invoke(input) Output
}
class BaseRetriever {
<<abstract>>
+invoke(query: str) List~Document~
#_get_relevant_documents(query, run_manager) List~Document~
#_aget_relevant_documents(query, run_manager) List~Document~
}
class VectorStoreRetriever {
+vectorstore: VectorStore
+search_type: str
+search_kwargs: dict
#_get_relevant_documents(query) List~Document~
}
class MultiQueryRetriever {
+retriever: BaseRetriever
+llm_chain: Runnable
#_get_relevant_documents(query) List~Document~
}
class ContextualCompressionRetriever {
+base_retriever: BaseRetriever
+base_compressor: BaseDocumentCompressor
#_get_relevant_documents(query) List~Document~
}
class EnsembleRetriever {
+retrievers: List~BaseRetriever~
+weights: List~float~
#_get_relevant_documents(query) List~Document~
}
Runnable <|-- BaseRetriever
BaseRetriever <|-- VectorStoreRetriever
BaseRetriever <|-- MultiQueryRetriever
BaseRetriever <|-- ContextualCompressionRetriever
BaseRetriever <|-- EnsembleRetriever
BaseRetriever Internals
BaseRetriever follows the same Template Method pattern as BaseChatModel:
class BaseRetriever(RunnableSerializable[str, List[Document]]):
"""Abstract base class for retrievers."""
def invoke(self, input: str, config=None) -> List[Document]:
"""Public API: handles callbacks, delegates to _get_relevant_documents."""
config = ensure_config(config)
callback_manager = CallbackManager.configure(config.get("callbacks"))
# Fire on_retriever_start
run_manager = callback_manager.on_retriever_start(
serialized=self.to_json(),
query=input,
)
try:
# Delegate to subclass
docs = self._get_relevant_documents(input, run_manager=run_manager)
# Fire on_retriever_end
run_manager.on_retriever_end(docs)
return docs
except Exception as e:
run_manager.on_retriever_error(e)
raise
@abstractmethod
def _get_relevant_documents(
self, query: str, *, run_manager=None
) -> List[Document]:
"""Subclasses implement this."""
VectorStoreRetriever
The most common retriever wraps a VectorStore:
class VectorStoreRetriever(BaseRetriever):
vectorstore: VectorStore
search_type: str = "similarity" # "similarity", "mmr", or "similarity_score_threshold"
search_kwargs: dict = {"k": 4}
def _get_relevant_documents(self, query: str, **kwargs) -> List[Document]:
if self.search_type == "similarity":
return self.vectorstore.similarity_search(query, **self.search_kwargs)
elif self.search_type == "mmr":
return self.vectorstore.max_marginal_relevance_search(
query, **self.search_kwargs
)
elif self.search_type == "similarity_score_threshold":
docs_and_scores = self.vectorstore.similarity_search_with_score(
query, **self.search_kwargs
)
threshold = self.search_kwargs.get("score_threshold", 0.0)
return [doc for doc, score in docs_and_scores if score >= threshold]
You create one from a VectorStore using .as_retriever():
vectorstore = Chroma.from_documents(documents, embeddings)
# Default: similarity search, k=4
retriever = vectorstore.as_retriever()
# MMR with custom k
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 6, "fetch_k": 20, "lambda_mult": 0.7}
)
# Score threshold
retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.8}
)
Retriever in LCEL Chains
Because BaseRetriever is a Runnable, it plugs directly into LCEL:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# Classic RAG chain
template = """Answer the question based on the context:
Context: {context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
# retriever is a Runnable[str, List[Document]]
rag_chain = (
{
"context": retriever | format_docs, # Retrieve and format
"question": RunnablePassthrough() # Pass through the query
}
| prompt
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
)
answer = rag_chain.invoke("What is LangChain?")
flowchart LR
Q["Question"] --> Par["RunnableParallel"]
Par --> R["Retriever\n(query -> docs)"]
Par --> PT["Passthrough\n(query -> query)"]
R --> FD["format_docs\n(docs -> str)"]
FD --> Merge["{'context': ..., 'question': ...}"]
PT --> Merge
Merge --> Prompt["ChatPromptTemplate"]
Prompt --> Model["ChatOpenAI"]
Model --> Parser["StrOutputParser"]
Parser --> Answer["Answer"]
classDef retrieval fill:#f3e5f5,stroke:#4a148c
classDef llm fill:#e1f5fe,stroke:#01579b
class R,FD retrieval
class Prompt,Model,Parser llm
Advanced Retriever Patterns
MultiQueryRetriever generates multiple query variations to improve recall:
from langchain.retrievers import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(
retriever=vectorstore.as_retriever(),
llm=ChatOpenAI(model="gpt-4o-mini"),
)
# Internally:
# 1. LLM generates 3 variations of the query
# 2. Each variation is sent to the base retriever
# 3. Results are deduplicated and returned
EnsembleRetriever combines results from multiple retrievers with weighted scoring:
from langchain.retrievers import EnsembleRetriever
# Combine semantic search with keyword search
ensemble = EnsembleRetriever(
retrievers=[vector_retriever, bm25_retriever],
weights=[0.7, 0.3] # 70% semantic, 30% keyword
)
Vector Store Backend Comparison
| Backend | Type | Persistence | Scalability | Best For |
|---|---|---|---|---|
| Chroma | In-process / Client-server | Local directory | Moderate | Development, small-medium datasets |
| FAISS | In-process | Serialized file | High (with GPU) | High-performance local search |
| Pinecone | Managed cloud | Fully managed | Very high | Production at scale |
| pgvector | PostgreSQL extension | Database | High | When you already use Postgres |
| Weaviate | Self-hosted / Cloud | Configurable | High | Hybrid search (vector + keyword) |
| Qdrant | Self-hosted / Cloud | Configurable | High | Filtering + payload search |
Summary
| Concept | Key Takeaway |
|---|---|
Embeddings | Abstract base with separate embed_documents and embed_query methods |
VectorStore | Interface for add, search, and delete with multiple search strategies |
| MMR | Balances relevance with diversity using iterative selection |
BaseRetriever | Runnable interface that connects vector stores to LCEL chains |
VectorStoreRetriever | Configurable wrapper with similarity, MMR, and threshold modes |
| Factory methods | from_texts and from_documents for one-step store creation |
Key Takeaways
Embeddingsseparates document and query encoding. This enables asymmetric embedding models where documents and queries are encoded differently.VectorStoreis a CRUD interface for vectors. It abstracts the database backend, making it trivial to swap Chroma for Pinecone or FAISS.- Retrievers are Runnables.
BaseRetrieverimplements theRunnableinterface, so it plugs directly into LCEL chains with the pipe operator. - MMR prevents redundant retrieval. By balancing relevance and diversity, MMR produces more informative context for the LLM.
- Advanced patterns compose basic retrievers.
MultiQueryRetriever,EnsembleRetriever, andContextualCompressionRetrieverall wrapBaseRetrieverimplementations.
Next Steps
With data loading, splitting, embedding, and retrieval covered, we are ready to explore how agents use these components to take actions. Continue to Chapter 7: Agent Architecture.
Built with insights from the LangChain project.
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for List, query, self so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 6: Vector Store Abstraction as an operating subsystem inside LangChain Architecture: Internal Design Deep Dive, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around Document, VectorStore, float as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 6: Vector Store Abstraction usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
List. - Input normalization: shape incoming data so
queryreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
self. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Walkthrough
Use the following upstream sources to verify implementation details while reading this chapter:
- View Repo
Why it matters: authoritative reference on
View Repo(github.com).
Suggested trace strategy:
- search upstream code for
Listandqueryto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production