TemperStack
Intermediate12 min readUpdated Mar 18, 2026

How to build RAG AI workflows on n8n

Quick Answer

Build RAG AI workflows on n8n by combining document processing nodes, vector database operations, and LLM nodes to create intelligent question-answering systems. Connect data ingestion, embedding generation, vector search, and response generation in a seamless automated workflow.

Prerequisites

  1. Basic n8n workflow experience
  2. OpenAI API key or similar LLM access
  3. Vector database setup (Pinecone, Weaviate, or Qdrant)
  4. Understanding of embeddings and vector search concepts
1

Set up your n8n workflow foundation

Create a new workflow in n8n and add a Webhook node as your trigger. Configure the webhook to accept POST requests with JSON data containing user queries. Set the webhook path to something like /rag-query and enable Return Response option to send results back to the caller.
Tip
Use a Manual Trigger node during development to test your workflow before exposing it via webhook
2

Add document processing and chunking

Insert a Code node after your trigger to process incoming documents. Use JavaScript to split large documents into smaller chunks:

const text = $input.first().json.document;
const chunkSize = 1000;
const chunks = [];

for (let i = 0; i < text.length; i += chunkSize) {
chunks.push({
chunk: text.slice(i, i + chunkSize),
index: Math.floor(i / chunkSize)
});
}

return chunks.map(chunk => ({ json: chunk }));
Tip
Keep chunks between 500-1500 characters for optimal embedding performance
3

Generate embeddings for your content

Add an OpenAI node and select Embeddings operation. Configure it to use the text-embedding-3-small model for cost efficiency. Map the chunk text from the previous node to the Input Text field. Enable Execute Once for All Items to process multiple chunks efficiently.
Tip
Store the original chunk text alongside embeddings for retrieval - you'll need both later
4

Store embeddings in vector database

Connect a Pinecone or HTTP Request node (for other vector DBs) to store your embeddings. For Pinecone, use the Insert operation and map:
  • ID: Generate unique identifier using {{$json.index}}-{{Date.now()}}
  • Values: The embedding array from OpenAI
  • Metadata: Include original text and any relevant metadata
Tip
Include meaningful metadata like document source, timestamp, and categories for better filtering during retrieval
5

Implement query processing and vector search

Create a separate branch for handling user queries. Add another OpenAI node to generate embeddings for the incoming question. Then add a Pinecone node with Query operation to find similar content. Set Top K to 5-10 results and include metadata in the response. Use the query embedding as the search vector.
Tip
Experiment with different similarity thresholds to balance relevance and recall
6

Merge retrieved context with user query

Add a Code node to combine search results into a coherent context. Process the vector search results:

const searchResults = $input.all();
const context = searchResults
.map(item => item.json.metadata.text)
.join('\n\n');

const userQuery = $('Webhook').first().json.query;

return [{
json: {
context: context,
query: userQuery,
prompt: `Context: ${context}\n\nQuestion: ${userQuery}\n\nAnswer:`
}
}];
Tip
Limit context length to stay within your LLM's token limits - typically 4000-8000 tokens for most models
7

Generate AI response using retrieved context

Add a final OpenAI node with Chat operation using gpt-4 or gpt-3.5-turbo. Create a system message: You are a helpful assistant. Answer questions based only on the provided context. Use the constructed prompt from the previous step as the user message. Set temperature to 0.1 for more consistent responses.
Tip
Include instructions to say 'I don't know' if the context doesn't contain relevant information
8

Format and return the final response

Add a final Code node to format the response for your application. Structure the output with the AI answer, source information, and confidence indicators:

const aiResponse = $input.first().json.choices[0].message.content;
const sources = $('Pinecone Query').all().map(item => item.json.metadata);

return [{
json: {
answer: aiResponse,
sources: sources,
timestamp: new Date().toISOString()
}
}];
Tip
Include source citations and confidence scores to make your RAG system more transparent and trustworthy

Troubleshooting

Vector search returns irrelevant results
Check your embedding model consistency - ensure you use the same model for both indexing and querying. Also verify your chunk size isn't too large and consider adding more specific metadata filtering in your Pinecone Query node.
Workflow timeouts with large documents
Enable Execute Once for All Items in your OpenAI embedding nodes and consider implementing batch processing using the Split Into Batches node to handle large document sets in smaller chunks.
AI responses are inconsistent or hallucinated
Lower the temperature setting to 0.1 in your OpenAI Chat node and strengthen your system prompt to emphasize answering only from provided context. Add explicit instructions to respond with 'Information not available' when context is insufficient.
High API costs from OpenAI calls
Use text-embedding-3-small instead of larger models for embeddings, implement caching for repeated queries using n8n's Redis node, and set reasonable limits on context length and search result counts.

Related Guides

More n8n Tutorials

Other Tool Tutorials

Ready to get started with n8n?

Put this tutorial into practice. Visit n8n and follow the steps above.

Visit n8n