Michelle Anne Tabirao
on 15 May 2025

Building an end-to-end Retrieval- Augmented Generation (RAG) workflow

Share on:

Introduction

One of the most critical gaps in traditional Large Language Models (LLMs) is that they rely on static knowledge already contained within them. Basically, they might be very good at understanding and responding to prompts, but they often fall short in providing current or highly specific information. This is where RAG comes in; RAG addresses these critical gaps in traditional LLMs by incorporating current and new information that serves as a reliable source of truth for these models.

In our previous blog on understanding and deploying RAG, we walked you through the basics of what this technique is and how it enhances generative AI models by utilizing external knowledge sources such as documents and extensive databases. These external knowledge bases enhance machine learning models for enterprise applications by providing verifiable, up-to-date information that reduces errors, simplifies implementation, and lowers the cost of continuous retraining.

Building a robust generative AI infrastructure, such as those for RAG, can be complex and challenging. It requires careful consideration of the technology stack, data, scalability, ethics, and security. For the technology stack, the hardware, operating systems, cloud services, and generative AI services must be resilient and efficient based on the scale that enterprises require.

There are several open source software options available for building generative AI infrastructure and complex AI projects that accelerate development, avoid vendor lock-in, reduce costs, and satisfy enterprise needs.

Objective

In this guide, we will take you through setting up a RAG pipeline. We will utilize open source tools such as Charmed OpenSearch for efficient search retrieval and KServe for machine learning inference, specifically in Azure and Ubuntu environments while leveraging silicons.

This guide is intended for data enthusiasts, engineers, scientists, and machine learning professionals who want to start building RAG solutions on public cloud platforms, such as Azure, using enterprise open source tools that are not native to Azure microservices offering. This can be used for various projects, including proofs of concept, development, and production.

Please note that multiple open source tools not highlighted in this guide can be used in place of the ones we outline. In cases where you do use different tools, you should adjust the hardware specifications—such as storage, computing power, and configuration—to meet the specific requirements of your use case.

RAG workflow

When building a generative AI project, such as a RAG system and advanced generative AI reference architectures, it is crucial to include multiple components and services. These components typically encompass databases, knowledge bases, retrieval systems, vector databases, model embeddings, large language models (LLMs), inference engines, prompt processing, and guardrail and fine-tuning services, among others.

RAG allows users to choose the most suitable RAG services and applications for their specific use cases. The reference workflow outlined below mainly utilizes two open source tools: Charmed OpenSearch and KServe. In the RAG workflow depicted below, fine-tuning is not mandatory; however, it can enhance the performance of LLMs as the project scales.

Figure 1: RAG workflow diagram using open source tools

The table below describes all the RAG services highlighted in the workflow diagram above and maps the open source solutions that are used in this guide.

Services	Description	Open source solutions
Advanced parsing	Text splitters are advanced parsing techniques for the document that goes to the RAG system. In this way, the document can be cleaner, and focused and will provide informative input.	Charmed Kubeflow: Text splitters
Ingest/data processing	The ingest or data processing is a data pipeline layer. This is responsible for data extraction, cleansing, and the removal of unnecessary data that you will run.	Charmed OpenSearch can be used for document processing
Embedding model	The embedding model is a machine-learning model that converts raw data to vector representations.	Charmed OpenSearch- Sentence transformer
Retrieval and ranking	This component retrieves the data from the knowledge base; it also ranks the relevance of the information being fetched based on relevance scores.	Charmed OpenSearch with FAISS (Facebook AI Similarity Search)
Vector database	A vector database stores vector embeddings so data can be easily searched by the ‘retrieval and ranking services’.	Charmed OpenSearch- KNN Index as a vector database
Prompt processing	This service formats queries and retrieved text into a readable format so it is structured to the LLM.	Charmed OpenSearch – OpenSearch: ML – agent predict
LLM	This component provides the final response using multiple GenAI models.	GPT LlamaDeepseek
LLM inference	This refers to operationalizing machine learning in production by processing running data into a machine learning model so that it gives an output.	Charmed Kubeflow with Kserve
Guardrail	This component ensures ethical content in the GenAI response by creating a guardrail filter for the inputs and outputs.	Charmed OpenSearch: guardrail validation model
LLM Fine-tuning	Fine-tuning is the process of taking a pre-trained machine learning model and further training it on a smaller, targeted data set.	Charmed Kubeflow
Model repository	This component is used to store and version trained machine learning models, especially within the process of fine-tuning. This registry can track the model’s lifecycle from deployment to retirement.	Charmed KubeflowCharmed MLFlow
Framework for building LLM application	This simplifies LLM workflow, prompts, and services so that building LLMs is easier.	Langchain

This table provides an overview of the key components involved in building a RAG system and advanced GenAI reference solution, along with associated open source solutions for each service. Each service performs a specific task that can enhance your LLM setup, whether it relates to data management and preparation, embedding a model in your database, or improving the LLM itself.

The deployment guide below will cover most of the services except the following: model repository, LLM fine-tuning, and text splitters.

The rate of innovation in this field, particularly within the open source community, has become exponential. It is crucial to stay updated with the latest developments, including new models and emerging RAG solutions.

RAG component: Charmed OpenSearch

Charmed OpenSearch will be mainly used in this RAG workflow deployment. Charmed OpenSearch is an operator that builds on the OpenSearch upstream by integrating automation to streamline the deployment, management, and orchestration of production clusters. The operator enhances efficiency, consistency, and security. Its rich features include high availability, seamless scaling features for deployments of all sizes, HTTP and data-in-transit encryption, multi-cloud support, safe upgrades without downtime, roles and plugin management, and data visualization through Charmed OpenSearch Dashboards.

With the Charmed OpenSearch operator (also known as charm), you can deploy and run OpenSearch on physical and virtual machines (VM) and other cloud and cloud-like environments, including AWS, Azure, Google Cloud, OpenStack, and VMware. For the next section, deployment guide, will be using Azure VM instances:

Figure 2: Charmed OpenSearch architecture

Charmed OpenSearch uses Juju. Juju is an open source orchestration engine for software operators that enables the deployment, integration, and lifecycle management of applications at any scale on any infrastructure. In the deployment process, the Juju controller manages the flow of data and interactions within multiple OpenSearch deployments, including mediating between different parts of the system.

Charmed OpenSearch deployment and use is straightforward. If you’d like to learn how to use and deploy it in a range of cloud environments, you can read more in our in-depth Charmed OpenSearch documentation.

RAG component: KServe

KServe is a cloud-native solution within the Kubeflow ecosystem that serves machine learning models. By leveraging Kubernetes, KServe operates effectively in cloud-native environments. It can be used for various purposes, including model deployment, machine learning model versioning, LLM inference, and model monitoring.

In the RAG use case discussed in this guide, we will use KServe to perform inference on LLM. Specifically, it will process an already-trained LLM to make predictions based on new data. This emphasizes the need for a robust LLM inference system that works with local and public LLMs. The system should be scalable, capable of handling high concurrency, providing low-latency responses, and delivering accurate answers to LLM-related questions.

In the deployment guide, we’ll take you through a comprehensive and hands-on guide to building and deploying a RAG service using Charmed OpenSearch and KServe. Charmed Kubeflow by Canonical natively supports KServe.

Deployment guide to building an end-to-end RAG workflow with Charmed OpenSearch and KServe

Our deployment guide for building an end-to-end RAG workflow with Charmed OpenSearch and KServe covers everything you need to make your own RAG workflow, including:

Prerequisites
Install Juju and configure Azure credentials
Bootstrap Juju controller and create Juju model for Charmed OpenSearch
Deploy Charmed OpenSearch and set up the RAG service
Ask and start conversational flow with your RAG

Access this guide here

Canonical for your RAG requirements

Canonical provides Data and AI workshop, and enterprise open source tools and services and can advise on securing the safety of your code, data, and models in production.

Build the right RAG architecture and application with the Canonical RAG workshop

Canonical offers a 5-day workshop designed to help you start building your enterprise RAG systems. By the end of the workshop, you will have a thorough understanding of RAG and LLM theory, architecture, and best practices. Together, we will develop and deploy solutions tailored to your specific needs. Download the datasheet here.

Learn and use best-in-class RAG tooling on any hardware and cloud

Unlock the benefits of RAG with open source tools designed for your entire data and machine learning lifecycle. Run RAG-enabled LLM on any hardware and cloud platform, whether in production or at scale.

Canonical offers enterprise-ready AI infrastructure along with open source data and AI tools to help you kickstart your RAG projects.

Secure your AI stack with confidence

Enhance the security of your GenAI projects while mastering best practices for managing your software stack. Discover ways to safeguard your code, data, and machine learning models in production with Confidential AI.

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Categories

Industries

Case studies ›

Partner programs

Quick links

Roles by department

Working here

Explore Canonical

Latest updates

Company highlights ›

Building an end-to-end Retrieval- Augmented Generation (RAG) workflow

Introduction

Objective

RAG workflow

RAG component: Charmed OpenSearch

RAG component: KServe

Canonical for your RAG requirements

Build the right RAG architecture and application with the Canonical RAG workshop

Learn and use best-in-class RAG tooling on any hardware and cloud

Secure your AI stack with confidence

Related posts

Building optimized LLM chatbots with Canonical and NVIDIA

Building RAG with enterprise open source AI infrastructure

How does OpenSearch work?