DataBy AI: Autonomous AI Data Agent

AI Window

Inspiration

As a junior data engineer, I often find myself cleaning, scraping, and assembling end-to-end pipelines — yet even these efforts can only partially relieve the bottlenecks that slow data teams down. Beyond that, countless datasets sit untouched in the corners of organizations — dormant assets that could be powering growth, insight, and smarter decision-making instead.

In a world obsessed with scale, both humans and data remind us: meaning doesn’t live in the noise — it lives in what we choose to preserve. DataBy AI is built to listen to that silence - extracting purpose from chaos and turning still data into motion.

What it does

DataBy AI functions as an Agent Swarm, a distributed cognitive architecture centered around main agent, Gaby AI. Rather than accumulating data indiscriminately, Gaby learns to preserve only what contributes to identity and coherence. In doing so, he evolves autonomous behaviors — refined decision-making, self-organized learning, and goal-driven planning — characteristics that bring him closer to cognitive individuality.

Gaby AI architecture embodies autonomy via his cognitive loop — perception, reasoning, action — enables refined decision-making, self-directed learning, and long-term adaptive planning. In practice, his objective is to perform with full spectrum of data engineering tasks without human oversight, guided by principles of efficiency, consistency, and resilience.

A typical data journey through Gaby’s domain follows five structured stages: 1. Data Collection, Exploration, and Modelling 2. Data Cleaning and Wrangling 3. Data Testing and Quality Assurance 4. Data Analytics and Interpretation 5. Executive Dashboards and Strategic Insights

Within this system, Bedrock AgentCore + MCP serves as the coordination substrate — facilitating reasoning, dependency analysis, and autonomous orchestration across agents. SageMaker AI Studio, by contrast, functions as the cognitive workshop: the environment in which Gaby’s models retrain, optimize, and evolve through continuous feedback.

Gaby’s intelligence is characterized by two defining capabilities: 1. Hybrid Reinforcement Learning and Decision Systems — enabling multi-objective optimization and adaptive reasoning under uncertainty to perform intrisic tasks such as data feature engineering and reasoning datasets. 2. AI-Powered Dashboards — an interactive analytical interface that translates complex system behavior into actionable, human-readable intelligence.

In essence, DataBy AI represents a shift in paradigm — from data infrastructure that is operated to infrastructure that is self-governing. It is a step toward autonomous cognition in data systems: infrastructure that does not merely run, but thinks, reasons, and evolves.

How we built it

• FastAPI Orchestration Core — serves as the control layer for managing autonomous agent loops (perception → cognition → action), enabling each subsystem to operate as part of a continuous, self-regulating lifecycle.
• Backend Intelligence Stack — integrates Hugging Face and Kaggle dataset searchers for dynamic data sourcing, a MongoDB connector for real-time database access, and persistent memory modules that allow agents to learn from prior states and adapt behavior over time.
• Autonomous Cognition Architecture — powered by Bedrock AgentCore and SageMaker AI Studio, allowing model retraining, reinforcement-driven decision-making, and contextual inference across multiple data environments.
• React + Tailwind Visualization Layer — a dynamic front-end that renders the data lifecycle as an animated, living system — visualizing how each agent perceives, reasons, and acts within its operational domain.
• Cognitive Modularity — each component functions like a specialized brain region, responsible for reasoning, monitoring, and adaptive response, creating an emergent intelligence distributed across the swarm.

For more insights, visit my blog.

Challenges we ran into

The main struggles were:

Networking / Routing / Serving - especially Gaby's Window. As a POC, it seemed more plausible to reduce as much cost as possible. This is feasible by opting Gaby's live streaming playground for SSE over a websocket.
Autonomous vs Test cases and backups.

Accomplishments

Datasets are integrated into his backend FastAPI for easy access.
Achieved a functioning autonomous loop where the AI detects anomalies and revalidates data without prompts.
Designed a modular cognitive architecture that scales across different data sources.
Created an interactive visualization that makes complex data processes intuitively understandable.
Proved that data infrastructure can indeed be self-sustaining — not just automated.

What we learned

It’s a great time to be alive — building a human being is more feasible than one might think.
Bedrock AgentCore is extremely powerful; it consolidates networking and external platforms into a single, unified playground.

What’s next for Autonomous AI Data Agent

The next phase involves advancing the online learning algorithms that drive the core agents. An intriguing pattern emerged during testing: the more tightly generative models are constrained, the more stochastic and unpredictable their behaviors become. To address this, the next focus is on developing a decision-making framework that aligns with the latent structure of deep learners, allowing for more stable yet adaptive reasoning.

This project has demonstrated that orchestrating a collective of autonomous agents is feasible in the short term, but sustaining coherent long-term planning remains a challenge. Temporal consistency and strategic foresight will require new mechanisms for cross-agent memory and reinforcement. Hopefully, overcoming these challenges will bring the world a step closer to mimicry algorithms — systems capable of learning, adapting, and evolving by imitating the intricate reasoning patterns found in both humans and nature.

Built With

Updates

Mimi Phan started this project — Oct 20, 2025 12:36 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.