Retrieval-Augmented Generation (RAG) is an AI technique where relevant documents or data are fetched and handed to a language model as context before it generates a response. Instead of answering from memory alone, the model reads the retrieved content and uses it to produce a grounded answer.
Think of it as an open-book exam. Fine-tuning is memorization. RAG is letting the model look things up in real time.
RAG vs fine-tuning
Use fine-tuning when you want to change how the model behaves: its tone, format, or domain vocabulary. It's baked into the weights but expensive to update and slow to iterate.
Use RAG when you want the model to know current or specific facts: your product docs, customer history, latest pricing. You update the documents and the model has fresh context on the next query. No retraining needed.
Most practical AI features in SaaS, including search, support bots, and feedback summarization, use RAG. Fine-tuning is reserved for tasks where behavior change is the actual goal.
The catch
RAG is only as good as the retrieval step. Pull in the wrong documents and the model confidently answers from bad context. Getting retrieval right is where most of the engineering effort actually lives.