RAG
Overview
RAGService
integrates a knowledge base with large language model (LLM) reasoning to answer natural language questions. The system retrieves relevant context from multiple knowledge bases, uses embeddings and similarity metrics for relevance scoring, and generates coherent answers via an LLM.
What is RAG?
Retrieval-Augmented Generation (RAG) combines: - Retrieval: Searching a knowledge base or document collection for relevant information related to a query. - Augmented Generation: Conditioning a generative language model on retrieved context to produce precise, informative answers.
RAG improves LLM accuracy by grounding responses in specific, relevant documents instead of relying solely on model parameters.
Use Case
The RAGService
is designed to provide a conversational AI interface for answering questions based on curated knowledge bases. This is useful for:
- Fetch previous command history
- On grpc answer any question asked by the host with some context
Architecture and Implementation
Components
- ROS 2 Node (
RAGService
): Implements the service that listens for question requests and responds with generated answers. - ChromaAdapter: Interface to vector search over knowledge bases, handling embedding queries and document retrieval.
- OpenAI LLM Client: Connects to an LLM API for question cleaning and answer generation.
- Dialogs module (
clean_question_rag
&get_answer_question_dialog
): Prepares prompts and context for the LLM.
Key Functionalities
Feature | Description |
---|---|
Initialization | Reads ROS parameters; sets up ChromaAdapter , OpenAI client, and AnswerQuestion service. |
Cosine Similarity | Computes similarity between two embedding vectors, safely handling zero vectors. |
Recursive Comparison | Recursively compares objects (dicts, lists, primitives) using embeddings for text similarity. |
Question Cleaning | Preprocesses and cleans questions with LLM before retrieval to improve query quality. |
Answer Callback | Handles question requests; (commented) performs retrieval; currently uses hardcoded context; generates and returns LLM answer. |
Knowledge Bases
The system uses 3 main knowledge bases stored as JSON files and accessed via ChromaAdapter
:
- frida_knowledge.json
- roborregos_knowledge.json
- tec_knowledge.json
These contain curated domain-specific information such as team background, projects, events, and technical documentation relevant to the robotics community and smart assistant (FRIDA).
Dialogs Module
Function | Purpose |
---|---|
clean_question_rag(question) |
Creates a prompt that instructs the LLM to determine relevant information to fetch for answering. |
get_answer_question_dialog(contexts, question) |
Builds a prompt combining retrieved contexts with the question, adding system instructions for FRIDA's concise, friendly, and context-aware answers. |
Areas for Improvement
- Enable Dynamic Retrieval: The current callback uses a fixed context list instead of querying the vector DB via
ChromaAdapter
. Restoring and debugging the retrieval code would improve real-time relevance. - Expand Embedding Models: Supporting multiple embedding models for recursive similarity calculations could improve matching accuracy.
- Error Handling: Improve robustness in embedding calls and LLM responses, including fallback mechanisms.
- Parameterization: Expose more parameters for tuning retrieval thresholds, top-k results, and model temperature dynamically.
- Logging and Monitoring: Enhance logging for debugging retrieval and generation steps.
- Context Management: Implement context caching or session handling for multi-turn dialogues.