Chapter 5: Retrieval-Augmented Generation (RAG)

April 13, 2026 · View on GitHub

Welcome to Chapter 5: Retrieval-Augmented Generation (RAG). In this part of n8n AI Tutorial: Workflow Automation with AI, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Build knowledge-based AI systems that retrieve relevant information and generate accurate responses.

RAG Workflow in n8n

flowchart TD
    subgraph Ingestion
        DOC[Documents] --> LOAD[Document Loader]
        LOAD --> SPLIT[Text Splitter]
        SPLIT --> EMBED[Embeddings Node]
        EMBED --> VS[(Vector Store\nPinecone / Qdrant / Supabase)]
    end

    subgraph Query
        Q[User Question] --> QEMBED[Embeddings Node]
        QEMBED --> SEARCH[Vector Store Search]
        SEARCH --> CTX[Retrieved Context]
        CTX --> LLM[AI Chat Node]
        LLM --> ANS[Answer]
    end

RAG Fundamentals

RAG combines retrieval of relevant documents with generative AI to provide accurate, context-aware responses.

Document Ingestion Pipeline

File Upload and Processing

{
  "nodes": [
    {
      "parameters": {
        "operation": "upload",
        "binaryData": true,
        "options": {}
      },
      "name": "File Upload",
      "type": "n8n-nodes-base.filesReadWrite",
      "typeVersion": 1
    },
    {
      "parameters": {
        "operation": "pdfToText",
        "binaryData": true,
        "dataPropertyName": "data"
      },
      "name": "PDF Extractor",
      "type": "n8n-nodes-base.extractFromFile"
    },
    {
      "parameters": {
        "dataPropertyName": "data",
        "extractionValues": {
          "values": [
            {
              "key": "title",
              "cssSelector": "title",
              "returnValue": "text"
            },
            {
              "key": "content",
              "cssSelector": "body",
              "returnValue": "text"
            }
          ]
        }
      },
      "name": "HTML Extractor",
      "type": "n8n-nodes-base.html"
    }
  ]
}

Document Chunking

// Smart document chunking
const text = $input.item.json.document_text;
const chunkSize = 1000;
const overlap = 200;

const chunks = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
  const chunk = text.slice(i, i + chunkSize);
  chunks.push({
    text: chunk,
    chunk_id: i,
    start_pos: i,
    end_pos: Math.min(i + chunkSize, text.length)
  });
}

return chunks.map(chunk => ({ json: chunk }));

Vector Embeddings

Embedding Generation

{
  "parameters": {
    "model": "text-embedding-ada-002",
    "input": "={{ $json.chunks.map(c => c.text) }}"
  },
  "name": "Generate Embeddings",
  "type": "@n8n/n8n-nodes-langchain.openAi"
}

Local Embeddings with Ollama

{
  "parameters": {
    "baseUrl": "http://localhost:11434",
    "model": "nomic-embed-text",
    "prompt": "{{ $json.chunk_text }}"
  },
  "name": "Local Embeddings",
  "type": "@n8n/n8n-nodes-langchain.ollama"
}

Vector Database Storage

Pinecone Integration

{
  "parameters": {
    "operation": "upsert",
    "pineconeIndex": "knowledge-base",
    "items": "={{ $json.embeddings.map((emb, i) => ({ id: $json.chunk_ids[i], values: emb, metadata: { text: $json.chunks[i].text, source: $json.source } })) }}"
  },
  "name": "Store in Pinecone",
  "type": "@n8n/n8n-nodes-langchain.pinecone",
  "credentials": {
    "pineconeApi": "pinecone-api"
  }
}

Qdrant Integration

{
  "parameters": {
    "operation": "upsert",
    "qdrantCollection": "documents",
    "items": "={{ $json.embeddings.map((emb, i) => ({ id: $json.chunk_ids[i], vector: emb, payload: { text: $json.chunks[i].text, metadata: $json.metadata } })) }}"
  },
  "name": "Store in Qdrant",
  "type": "@n8n/n8n-nodes-langchain.qdrant",
  "credentials": {
    "qdrantApi": "qdrant-api"
  }
}

Query Processing

Similarity Search

{
  "parameters": {
    "operation": "getMany",
    "pineconeIndex": "knowledge-base",
    "query": "={{ $json.query_embedding }}",
    "numberOfResults": 5,
    "includeValues": false,
    "includeMetadata": true
  },
  "name": "Retrieve Context",
  "type": "@n8n/n8n-nodes-langchain.pinecone"
}

Context Preparation

// Combine retrieved documents
const retrieved = $input.all();
const context = retrieved.map(item => item.json.metadata.text).join('\n\n');

return [{
  json: {
    context: context,
    sources: retrieved.map(item => ({
      text: item.json.metadata.text,
      score: item.json.score,
      source: item.json.metadata.source
    })),
    total_chunks: retrieved.length
  }
}];

RAG Response Generation

Context-Augmented Prompting

{
  "parameters": {
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant. Use the provided context to answer questions accurately. If the context doesn't contain the answer, say so."
      },
      {
        "role": "user",
        "content": "Context:\n{{ $json.context }}\n\nQuestion: {{ $json.question }}\n\nAnswer based on the context:"
      }
    ],
    "maxTokens": 500
  },
  "name": "Generate RAG Response",
  "type": "@n8n/n8n-nodes-langchain.openAi"
}

Multi-Hop Reasoning

{
  "nodes": [
    {
      "parameters": {
        "model": "gpt-4o",
        "messages": [
          {
            "role": "user",
            "content": "Based on this context, what specific questions should I ask to get more information?\n\nContext: {{ $json.context }}\n\nQuestion: {{ $json.original_question }}"
          }
        ]
      },
      "name": "Generate Follow-up Questions",
      "type": "@n8n/n8n-nodes-langchain.openAi"
    },
    {
      "parameters": {
        "operation": "getMany",
        "pineconeIndex": "knowledge-base",
        "query": "={{ $json.followup_embedding }}",
        "numberOfResults": 3
      },
      "name": "Retrieve Additional Context",
      "type": "@n8n/n8n-nodes-langchain.pinecone"
    },
    {
      "parameters": {
        "model": "gpt-4o",
        "messages": [
          {
            "role": "user",
            "content": "Original context: {{ $json.original_context }}\nAdditional context: {{ $json.additional_context }}\n\nProvide a comprehensive answer to: {{ $json.original_question }}"
          }
        ]
      },
      "name": "Final Answer Generation",
      "type": "@n8n/n8n-nodes-langchain.openAi"
    }
  ]
}

Advanced RAG Patterns

Hybrid Search

// Combine semantic and keyword search
const query = $input.item.json.query;
const keywords = query.toLowerCase().split(' ');

// Semantic search results
const semanticResults = $input.item.json.semantic_results;

// Keyword filtering
const hybridResults = semanticResults.filter(result => {
  const text = result.metadata.text.toLowerCase();
  return keywords.some(keyword => text.includes(keyword));
});

// Re-rank by keyword matches
hybridResults.forEach(result => {
  const text = result.metadata.text.toLowerCase();
  result.keyword_matches = keywords.filter(k => text.includes(k)).length;
  result.hybrid_score = result.score * (1 + result.keyword_matches * 0.1);
});

hybridResults.sort((a, b) => b.hybrid_score - a.hybrid_score);

return [{
  json: {
    results: hybridResults.slice(0, 5),
    search_type: "hybrid"
  }
}];

Query Expansion

{
  "parameters": {
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "Generate 3 related search queries for: {{ $json.original_query }}"
      }
    ]
  },
  "name": "Query Expansion",
  "type": "@n8n/n8n-nodes-langchain.openAi"
}

Re-ranking

{
  "parameters": {
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Rank these documents by relevance to the query: {{ $json.query }}\n\nDocuments:\n{{ $json.documents.map((d, i) => `${i+1}. ${d.text}`).join('\\n') }}\n\nReturn rankings as JSON array."
      }
    ],
    "responseFormat": "json"
  },
  "name": "Re-rank Results",
  "type": "@n8n/n8n-nodes-langchain.openAi"
}

Knowledge Base Management

Incremental Updates

{
  "nodes": [
    {
      "parameters": {
        "resource": "file",
        "operation": "watch",
        "path": "./knowledge-base/",
        "options": {
          "watchFor": "files"
        }
      },
      "name": "File Watcher",
      "type": "n8n-nodes-base.filesReadWrite"
    },
    {
      "parameters": {
        "operation": "pdfToText",
        "binaryData": true
      },
      "name": "Process New Document",
      "type": "n8n-nodes-base.extractFromFile"
    },
    {
      "parameters": {
        "model": "text-embedding-ada-002",
        "input": "={{ $json.chunks }}"
      },
      "name": "Embed New Content",
      "type": "@n8n/n8n-nodes-langchain.openAi"
    },
    {
      "parameters": {
        "operation": "upsert",
        "pineconeIndex": "knowledge-base",
        "items": "={{ $json.new_embeddings.map((emb, i) => ({ id: `doc_${Date.now()}_${i}`, values: emb, metadata: { text: $json.chunks[i], source: $json.filename, timestamp: new Date().toISOString() } })) }}"
      },
      "name": "Update Vector DB",
      "type": "@n8n/n8n-nodes-langchain.pinecone"
    }
  ]
}

Version Control

{
  "parameters": {
    "dataToSave": {
      "version": "={{ Date.now() }}",
      "document_count": "={{ $json.total_documents }}",
      "last_updated": "={{ new Date().toISOString() }}",
      "index_stats": "={{ $json.index_stats }}"
    },
    "keys": {
      "type": "kb_version"
    }
  },
  "name": "Version Control",
  "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow"
}

Performance Optimization

Caching Strategy

{
  "parameters": {
    "dataToSave": {
      "query": "={{ $json.query }}",
      "response": "={{ $json.response }}",
      "context": "={{ $json.context }}",
      "timestamp": "={{ new Date().toISOString() }}"
    },
    "keys": {
      "query_hash": "={{ $json.query_hash }}"
    },
    "ttl": 3600
  },
  "name": "Response Cache",
  "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow"
}

Batch Processing

{
  "parameters": {
    "batchSize": 10,
    "options": {
      "merge": false
    }
  },
  "name": "Batch Embeddings",
  "type": "n8n-nodes-base.splitInBatches"
}

Monitoring and Analytics

Usage Tracking

// Track RAG performance
const ragMetrics = $workflow.expression.get('rag_metrics') || {
  total_queries: 0,
  avg_retrieval_time: 0,
  avg_generation_time: 0,
  cache_hit_rate: 0
};

ragMetrics.total_queries += 1;

if ($input.item.json.cached) {
  ragMetrics.cache_hits = (ragMetrics.cache_hits || 0) + 1;
}

ragMetrics.cache_hit_rate = (ragMetrics.cache_hits || 0) / ragMetrics.total_queries;

$workflow.expression.set('rag_metrics', ragMetrics);

return [{
  json: {
    metrics: ragMetrics,
    query_id: `query_${Date.now()}`
  }
}];

Quality Assessment

{
  "parameters": {
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Evaluate this RAG response for:\n1. Accuracy\n2. Completeness\n3. Relevance\n4. Helpfulness\n\nResponse: {{ $json.rag_response }}\nContext: {{ $json.context_used }}\nQuery: {{ $json.original_query }}\n\nProvide scores 1-10 and brief explanation."
      }
    ],
    "responseFormat": "json"
  },
  "name": "Quality Assessment",
  "type": "@n8n/n8n-nodes-langchain.openAi"
}

Best Practices

Chunking Strategy: Balance chunk size with semantic coherence
Embedding Selection: Choose embeddings that match your domain
Index Optimization: Regularly maintain and optimize vector indexes
Caching: Implement intelligent caching for frequent queries
Monitoring: Track retrieval quality and response accuracy
Updates: Implement incremental updates for changing knowledge
Security: Validate and sanitize retrieved content
Scalability: Design for growing knowledge bases

RAG transforms static documents into interactive knowledge systems. The next chapter explores AI-powered decision making and routing logic.

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for json, text, nodes so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

coupling core logic too tightly to one implementation path
missing the handoff boundaries between setup, execution, and validation
shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 5: Retrieval-Augmented Generation (RAG) as an operating subsystem inside n8n AI Tutorial: Workflow Automation with AI, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around parameters, name, langchain as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 5: Retrieval-Augmented Generation (RAG) usually follows a repeatable control path:

Context bootstrap: initialize runtime config and prerequisites for json.
Input normalization: shape incoming data so text receives stable contracts.
Core execution: run the main logic branch and propagate intermediate state through nodes.
Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
Output composition: return canonical result payloads for downstream consumers.
Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Key source files in n8n-io/n8n:

packages/@n8n/nodes-langchain/nodes/vector_store/ -- vector store nodes: Pinecone, Qdrant, Supabase, PGVector, In-Memory
packages/@n8n/nodes-langchain/nodes/retrievers/ -- retriever nodes; wrap vector stores for use as Agent tools
packages/@n8n/nodes-langchain/nodes/chains/ChainRetrievalQA/ -- RAG chain node: connects retriever + LLM into a question-answering pipeline

Suggested trace: find how VectorStoreQA chain node calls loadQAStuffChain() and see how retrieved documents are formatted into the LLM prompt context.