This diagram provides a high-level overview of a Simple Stateless Retrieval-Augmented Generation (RAG) System, showcasing how structured and unstructured documents are ingested, chunked, and transformed into vectorized representations for efficient retrieval. It illustrates the flow from user queries to refined prompts using small and large language models, combined with vector database search to generate contextually accurate responses.
The diagram highlights the interaction between core components—embedding models, vector databases, and LLMs—offering a visual representation of how AI systems enhance information retrieval and generation in a stateless architecture.
Some key terms to explore further:
Chunking
Ingesting
Embedding Model
Small LLM Model
Large LLM Model
Vectors
Tokens
Cosine Similarity
Query
Retrieval-Augmented Generation
Comments