Inspiration

The inspiration behind this project came from a common problem faced by teams and organizations: AI systems can generate insights, but they often cannot take actions. Users still have to manually switch between tools like Slack, GitHub, databases, and documents, which slows down workflows and increases the chances of errors.

We wanted to bridge this gap by creating an AI platform that acts like a digital teammate rather than just a chatbot. By combining intelligent reasoning with real-world tool integration through the Model Context Protocol (MCP), we envisioned a system that could understand documents, detect issues, and automatically perform tasks across different applications.

Our goal was to make AI more practical, collaborative, and accessible through a low-code architecture, enabling developers, educators, and product teams to automate repetitive work and focus on higher-value decisions.

What it does

Our AI Agent Platform enables users to upload PDFs and interact through a chat interface to extract insights and automate workflows. The AI analyzes documents, summarizes key information, compares data with external sources, identifies discrepancies, and performs real-world actions using MCP-integrated tools. It can automatically send Slack notifications, create GitHub issues, and access enterprise data sources. Built on the MERN stack with Anthropic Claude and Google Cloud services, the platform turns AI from a passive chatbot into an active digital teammate, helping teams reduce manual effort, improve collaboration, and accelerate decision-making.

How we built it

We built the platform using the MERN stack, with React for the frontend, Node.js and Express for the backend, and MongoDB for data storage. Anthropic Claude powers the AI reasoning engine, while Google Cloud services provide scalable infrastructure. We integrated the Model Context Protocol (MCP) to connect external tools such as GitHub, Slack, and BigQuery. PDF processing and document analysis are handled through dedicated backend services, and real-time responses are streamed through an interactive chat interface. The modular architecture allows AI agents to reason, use tools, and automate multi-step workflows seamlessly.

Challenges we ran into

Architecture & Protocol Hurdles MCP Latency & State Management: Claude expects fast, stateless tool responses via the Model Context Protocol (MCP), but fetching data from enterprise databases and waiting on third-party APIs (Slack/GitHub) causes slow, blocking HTTP requests.

Context Window Bloat: Feeding raw PDF text, database schemas, and multiple MCP tool definitions into Claude simultaneously risks exceeding token limits and exponentially increases API costs.

Data & Parsing Issues Layout-Heavy PDF Extraction: Standard text parsers turn complex tables and formatted financial data in PDFs into unstructured text strings, causing Claude to hallucinate or misidentify "discrepancies."

Database Schema Mapping: Translating complex, relational enterprise database schemas into a format that Claude can accurately query without exposing sensitive tables or structure.

Accomplishments that we're proud of

What we learned

What's next for Yaksha Agent Engine

  1. Hardening Agent Reliability & Guardrails Deterministic Evaluation Pipelines: Moving away from standard prompt engineering to structured frameworks (like Braintrust or LangSmith) to continuously test and benchmark Claude's discrepancy detection against historical ground-truth data.

Multi-Tiered Human-in-the-Loop (HITL) Workflows: Building granular permission gates in the React frontend. Low-risk actions (e.g., creating a GitHub issue) can be fully autonomous, while high-risk actions (e.g., executing database writes or notifying client-facing Slack channels) trigger an interactive approval dashboard.

  1. Advancing the Data & Memory Infrastructure Graph-RAG Integration: Transitioning from basic text semantic search to a Knowledge Graph structure. This allows the engine to map complex enterprise relationships—such as understanding how a line item in an uploaded invoice directly links to a specific client contract in the database.

Stateful Memory Across Workflows: Implementing a permanent memory layer using Redis or MongoDB to allow the engine to remember past discrepancies, user feedback, and resolution histories, ensuring it doesn't flag the same false-positive twice.

  1. Scaling the Model Context Protocol (MCP) Architecture Decoupled Async Tool Execution: Offloading heavy MCP tool requests (like deeply querying databases or heavy PDF ingestion) to a robust background worker queue (e.g., BullMQ) so the Express/Node.js backend remains stateless, fast, and unblocked.

Expanding the MCP Tooling Ecosystem: Building custom, secure MCP servers to expand the engine's reach beyond Slack and GitHub into core enterprise software like Jira, Salesforce, or internal ERP APIs.

Share this project:

Updates