GIF
Manual Validation
GIF

Inspiration

Following Geo-Compliance Regulations is a hard but important task.

What it does

This project implements a Retrieval-Augmented Generation (RAG) pipeline with a fine-tuned model using FastAPI for the backend and React for the frontend. The backend handles document processing, question answering, and integrates with various language models and vector stores. The frontend provides a user interface for interacting with the backend services.

How we built it

Retrieval-Augmented Generation (RAG) Pipeline

In order to provide accurate and contextually relevant answers to user queries, we have implemented a Retrieval-Augmented Generation (RAG) pipeline.

The RAG pipeline documents consist of several types of documents:

Legal documents (e.g., regulations, laws)
Business documents (e.g., company-specific terms/jargon)

The documents are processed and indexed using FAISS, a popular vector store, to enable efficient retrieval based on semantic similarity.

Models

We decided to use Qwen3-8B as our main LLM model for generation (the largest model we could load locally). This model was finetuned on a custom dataset to better handle domain-specific queries. The finetuning process involved:

Generating synthetic data using Gemini 2.5 Flash

This attempts to perform supervised fine-tuning (SFT) and knowledge distillation (KD) to enhance the model's performance on specific tasks by learning from high-quality data and a teacher model.

Pydantic

By using Pydantic, we are able to ensure the output of the LLM is structured and adheres to a predefined schema. This greatly helps in parsing and utilising the generated content effectively.

Pre and Post Guardrails

Query Rewriting

To ensure that user queries are well-formed and relevant, we implemented a query rewriting step using Ollama's LLM. This step reformulates the user's question to improve clarity and context before passing it to the retrieval and generation components. This helps in reducing ambiguity and enhancing the quality of the retrieved documents.

Hallucination Check

To mitigate the risk of hallucinations in the generated responses, we incorporated a hallucination check step. This step evaluates the generated answer against the retrieved documents to ensure factual accuracy. If the confidence score of the answer is below a certain threshold, the system flags it for review or requests additional information.

It will retry the generation step up to 3 times if the hallucination confidence is below the threshold.

Audit Ready Transparency

To maintain transparency and accountability, we log all interactions with the RAG pipeline. This includes:

User queries (timestamp, feature, feature description, answer)

Since the log cannot be tampered with by users, it provides an audit trail for all interactions, which is crucial for compliance and review purposes.

Single and Batch Processing

The RAG pipeline supports both single-question answering and batch processing of multiple queries. This flexibility allows users to efficiently handle large volumes of questions, making it suitable for various applications.

Users would also be able to verify single features without having to upload a CSV file.

Memory Integration

By allowing users to provide additional context or memory, the RAG pipeline can generate more informed and relevant answers.

The pipeline can be improved by users without having to retrain or modify the model.

Technologies Used

Frontend

React
Tailwind CSS

Backend

FastAPI
LangGraph
Ollama
Pydantic
FAISS
Qwen
NOMIC

Finetuning

PyTorch
Unsloth (Transformers, BitsAndBytes, etc.)
Qwen

Development tools

Visual Studio Code
Git
Linux
Windows
WSL2

API and Assets

Gemini 2.5 Flash - used to generate synthetic data for finetuning

Challenges we ran into

Lack of resources for a good teacher model
Lack of resources to finetuning a model perfectly

Accomplishments that we're proud of

We successfully developed an AI agent that self-corrects its outputs and dynamically performs tasks based on its internal checks. We also managed to fine-tune a small model, proving that we can achieve results even with limited time and resources.

What we learned

We learn that finetuning is something that can help improve a model quite easily. Using LangGraph and function calling, we can actually achieve an AI agent with tools to perform tasks.