Intermediate12 min readUpdated Mar 18, 2026

How to build RAG AI workflows on n8n

Q: Workflow timeouts with large documents

Enable Execute Once for All Items in your OpenAI embedding nodes and consider implementing batch processing using the Split Into Batches node to handle large document sets in smaller chunks.

Q: High API costs from OpenAI calls

Use text-embedding-3-small instead of larger models for embeddings, implement caching for repeated queries using n8n's Redis node, and set reasonable limits on context length and search result counts.

Quick Answer

Build RAG AI workflows on n8n by combining document processing nodes, vector database operations, and LLM nodes to create intelligent question-answering systems. Connect data ingestion, embedding generation, vector search, and response generation in a seamless automated workflow.

Prerequisites

Basic n8n workflow experience
OpenAI API key or similar LLM access
Vector database setup (Pinecone, Weaviate, or Qdrant)
Understanding of embeddings and vector search concepts

Set up your n8n workflow foundation

Create a new workflow in n8n and add a Webhook node as your trigger. Configure the webhook to accept POST requests with JSON data containing user queries. Set the webhook path to something like /rag-query and enable Return Response option to send results back to the caller.

Tip

Use a Manual Trigger node during development to test your workflow before exposing it via webhook

Add document processing and chunking

Insert a Code node after your trigger to process incoming documents. Use JavaScript to split large documents into smaller chunks:

const text = $input.first().json.document;
const chunkSize = 1000;
const chunks = [];

for (let i = 0; i < text.length; i += chunkSize) {
  chunks.push({
    chunk: text.slice(i, i + chunkSize),
    index: Math.floor(i / chunkSize)
  });
}

return chunks.map(chunk => ({ json: chunk }));

Tip

Keep chunks between 500-1500 characters for optimal embedding performance

Generate embeddings for your content

Add an OpenAI node and select Embeddings operation. Configure it to use the text-embedding-3-small model for cost efficiency. Map the chunk text from the previous node to the Input Text field. Enable Execute Once for All Items to process multiple chunks efficiently.

Tip

Store the original chunk text alongside embeddings for retrieval - you'll need both later

Store embeddings in vector database

Connect a Pinecone or HTTP Request node (for other vector DBs) to store your embeddings. For Pinecone, use the Insert operation and map:

ID: Generate unique identifier using {{$json.index}}-{{Date.now()}}
Values: The embedding array from OpenAI
Metadata: Include original text and any relevant metadata

Tip

Include meaningful metadata like document source, timestamp, and categories for better filtering during retrieval

Implement query processing and vector search

Create a separate branch for handling user queries. Add another OpenAI node to generate embeddings for the incoming question. Then add a Pinecone node with Query operation to find similar content. Set Top K to 5-10 results and include metadata in the response. Use the query embedding as the search vector.

Tip

Experiment with different similarity thresholds to balance relevance and recall

Merge retrieved context with user query

Add a Code node to combine search results into a coherent context. Process the vector search results:

const searchResults = $input.all();
const context = searchResults
  .map(item => item.json.metadata.text)
  .join('\n\n');

const userQuery = $('Webhook').first().json.query;

return [{
  json: {
    context: context,
    query: userQuery,
    prompt: `Context: ${context}\n\nQuestion: ${userQuery}\n\nAnswer:`
  }
}];

Tip

Limit context length to stay within your LLM's token limits - typically 4000-8000 tokens for most models

Generate AI response using retrieved context

Add a final OpenAI node with Chat operation using gpt-4 or gpt-3.5-turbo. Create a system message: You are a helpful assistant. Answer questions based only on the provided context. Use the constructed prompt from the previous step as the user message. Set temperature to 0.1 for more consistent responses.

Tip

Include instructions to say 'I don't know' if the context doesn't contain relevant information

Format and return the final response

Add a final Code node to format the response for your application. Structure the output with the AI answer, source information, and confidence indicators:

const aiResponse = $input.first().json.choices[0].message.content;
const sources = $('Pinecone Query').all().map(item => item.json.metadata);

return [{
  json: {
    answer: aiResponse,
    sources: sources,
    timestamp: new Date().toISOString()
  }
}];

Tip

Include source citations and confidence scores to make your RAG system more transparent and trustworthy

Troubleshooting

Vector search returns irrelevant results

Check your embedding model consistency - ensure you use the same model for both indexing and querying. Also verify your chunk size isn't too large and consider adding more specific metadata filtering in your Pinecone Query node.

Workflow timeouts with large documents

Enable Execute Once for All Items in your OpenAI embedding nodes and consider implementing batch processing using the Split Into Batches node to handle large document sets in smaller chunks.

AI responses are inconsistent or hallucinated

Lower the temperature setting to 0.1 in your OpenAI Chat node and strengthen your system prompt to emphasize answering only from provided context. Add explicit instructions to respond with 'Information not available' when context is insufficient.

High API costs from OpenAI calls

Use text-embedding-3-small instead of larger models for embeddings, implement caching for repeated queries using n8n's Redis node, and set reasonable limits on context length and search result counts.

Related Guides

Ready to get started with n8n?

Put this tutorial into practice. Visit n8n and follow the steps above.

Visit n8n →

How to build RAG AI workflows on n8n

Quick Answer

Prerequisites

Set up your n8n workflow foundation

Add document processing and chunking

Generate embeddings for your content

Store embeddings in vector database

Implement query processing and vector search

Merge retrieved context with user query

Generate AI response using retrieved context

Format and return the final response

Troubleshooting

Related Guides

More n8n Tutorials

How to install n8n with Docker

How to self-host n8n on a VPS

How to create your first workflow

How to set up a schedule trigger

Other Tool Tutorials

Make Tutorials

Zapier Tutorials

Ready to get started with n8n?