Inspiration We were inspired by the limitations of "one-shot" LLM prompts. While Generative AI is powerful, asking a model to extract data, validate semantics, perform logical interactions, and format the output all in a single pass often leads to hallucinations or skipped constraints. We wanted to build a system that proves Agentic Workflows—chaining specialized agents together—can achieve far higher reliability and accuracy than a single "super prompt." The specific use case of reconciling messy human explanations against rigid inventory requirements was the perfect test bed for this theory.
What it does The Agentic Explanation Validator is an intelligent dashboard that automates the review of human-generated reports. Instead of a human manually checking if an employee's reason for a missing item is valid ("It was out of stock" vs "I forgot"), our system runs a multi-stage pipeline:
Extracts messy unstructured text into submitted JSON. Validates the semantic meaningfulness of every reasoning (filtering out "slacker" responses like "idk"). Compares the validated data against a master requirement list using deterministic logic. Generates a final, professional report highlighting exactly what is missing and why. How we built it We built the frontend using React 19 and TypeScript for a type-safe, robust user interface, bundled with Vite for speed.
The core intelligence is powered by the Google Gemini API. We architected a "Chain of Agents" pattern:
Agent 1 (Extraction): Uses Gemini to parse natural language into structured data. Agent 2 (Validation): A separate Gemini call that acts as a quality gatekeeper. Agent 3 (Logic): We utilized TypeScript for set operations ($A \cap B$) to handle the logic layer, ensuring 100% deterministic results where LLMs often fail. Agent 4 (Formatting): A final Gemini pass to polish the output. The application handles file inputs (PDFs via pdfjs-dist and Excel via xlsx) and is deployed globally on Netlify.
Challenges we ran into Reliability was our biggest hurdle. Early in development, the "Extraction Agent" would sometimes hallucinate items that didn't exist, or the "Validation Agent" would be too lenient. We had to iterate extensively on our system prompts to create strict boundaries for each agent. Another challenge was state management; coordinating the asynchronous hand-offs between four distinct stages of processing required a robust state machine in our React frontend to keep the UI responsive and informative.
Accomplishments that we're proud of We are most proud of the hybrid architecture. By mixing probabilistic AI (for understanding language) with deterministic code (for set math and logic), we eliminated a huge class of common AI errors. If the AI says an item validity is true, our code logic ensures it is always correctly reconciled with the master list, something purely LLM-based approaches struggle to guarantee $100%$ of the time.
What we learned We learned that decomposition is key. Breaking a complex reasoning task into smaller, atomic steps doesn't just make the system more reliable; it makes it easier to debug. When the output was wrong, we could pinpoint exactly which "Agent" failed—whether it was an extraction error or a validation judgment call—rather than guessing at which part of a massive prompt went wrong.
What's next for Agentic Explanation Validator We plan to introduce a "Human-in-the-Loop" feature, allowing users to override an agent's decision at any stage of the pipeline. We also want to expand the input types to support handwriting recognition for physical inventory logs.
Log in or sign up for Devpost to join the conversation.