RAG Architecture

What is Retrieval Augmented Generation?

Retrieval Augmented Generation (RAG) combines a retrieval system with a large language model.

Instead of asking the model to rely on training data alone, a search system retrieves relevant documents and feeds them to the model as context.

Core Components of RAG

A typical RAG system contains three parts:

Document store
Vector database
Language model

The document store contains structured knowledge that can be chunked and embedded.

Vector Databases

Vector databases store embeddings representing the semantic meaning of text or other data.

When content is embedded into vectors, similar concepts produce vectors that appear close together in high-dimensional space. A vector database indexes these embeddings and allows fast similarity searches.

Instead of matching keywords, the system compares vectors and retrieves content that is semantically related to a query.

Examples of vector databases include:

FAISS
Pinecone
Weaviate
Qdrant