ReconGuard 🛡️🤖
Inspiration
The absolute biggest bottleneck in any modern data pipeline isn't building the model—it’s the brutal, tedious process of preprocessing and validating data. Data scientists and compliance officers spend up to 80% of their time manually cleaning dirty data, handling tabular edge cases, and correcting format anomalies. It’s a massive time sink, and a single missed error can cascade downstream, destroying model accuracy and wasting thousands of dollars in compute time.
We built ReconGuard to change that. We wanted to take the human out of the loop for repetitive validation tasks and deploy an intelligent, collaborative swarm of agents capable of diagnosing and repairing data issues autonomously.
What it does
ReconGuard is an autonomous, agentic data-cleansing and validation platform. Users ingest tabular datasets (CSV files) via a streamlined upload gateway. Once inside, the data isn't just parsed by static scripts—it is handed over to a coordinated network of M-Agents (Multi-Agents).
Our specialized agents work in parallel to:
- Analyze: Inspect the structural layout of the file and identify data corruption or schema drifts.
- Diagnose & Rank: Evaluate anomalies based on operational severity.
- Remediate: Leverage dedicated tool-calling capabilities to actively repair data types, fill missing entries safely, and flag non-compliant records.
- Narrate: Translate raw technical fixes into a clean, human-readable audit narrative so data teams know exactly what changed and why.
How we built it
We engineered ReconGuard with a high-performance, responsive, and decoupled architecture designed to handle fast data streaming:
- Frontend: Built with Next.js, React, and Tailwind CSS to create a slick, low-friction dashboard featuring real-time file upload states and agentic process visualization.
- Backend: Powered by a highly performant FastAPI gateway that handles the incoming multi-part data streams and orchestrates the asynchronous execution of our multi-agent network.
- Agent Memory Layer: We integrated Cognee to give our M-Agents a robust, persistent semantic memory graph. This ensures the agents don't just treat every file as an isolated event—they maintain contextual memory of data structures, organizational schemas, and past corrections over time.
Challenges we ran into
Dealing with the raw, chaotic, and tabular nature of ingested data was an immediate hurdle. Unlike unstructured text, tabular datasets have strict, interdependent spatial relationships—shifting a column or incorrectly modifying a null cell can corrupt an entire matrix. Getting LLM-based agents to reliably interpret complex CSV rows, reason about data types without hallucinating, and execute precise programmatic transformations required intense prompt engineering, strict schema enforcement, and rigorous deterministic guardrails.
Accomplishments that we're proud of
- Zero to Ingest in 5 Hours: Building a robust frontend-to-backend file streaming pipeline that seamlessly interfaces with dynamic AI agents on a tight hackathon timeline.
- Cognee Graph Integration: Successfully implementing Cognee to map out data entities and agent memory. Watching the agents actually remember context and apply structured reasoning to multi-row tabular anomalies felt incredibly rewarding.
- True Multi-Agent Collaboration: Watching our independent agents seamlessly hand off tasks—from discovery to repair to reporting—without stepping on each other's toes or hitting infinite loops.
What we learned
We learned the profound power (and complexity) of graph-based agent memory. Utilizing Cognee fundamentally shifted how we view AI pipelines; we realized that agents shouldn't just rely on static context windows, but need evolving, structured knowledge bases to deal with complex business logic. We also deep-dived into asynchronous execution in FastAPI to ensure that heavy data processing doesn't block the user experience.
What's next for ReconGuard
- Advanced Autonomy & Tooling: We want to expand the agents' toolkit, giving them the ability to write and test their own Python data-cleansing scripts sandbox environments on the fly.
- Deeper Decision-Making: Implementing advanced self-reflection loops where agents can double-check their own data modifications against strict statistical baselines before finalizing a file repair.
- Enterprise Connectors: Moving beyond local CSV file uploads to integrate directly with live production data pipelines like Snowflake, Postgres, and AWS S3 buckets for real-time, continuous data guardrails.
Built With
- cognee
- fastapi
- react
Log in or sign up for Devpost to join the conversation.