Inspiration
Most AI agents today fail silently, hallucinate, or enter inefficient reasoning loops without feedback. We were inspired by the human ability of meta-cognition — thinking about our own thinking. ReflexionOS explores this concept by building an “OS for Agents” that treats failures as learning signals and converts execution traces into architectural improvements.
What it does
ReflexionOS is a high-performance agentic loop framework featuring:
• Autonomous Reflexion — Gemini 2.0 Flash analyzes failure modes and rewrites internal strategies. • Kernel-Level Observability — A Thinking Terminal and Reasoning Graph visualize agent meta-cognition in real time. • Sponsor-Grade Safety & Evaluation — Modulate Shield provides policy auditing while Airia Enterprise scores agent performance. • Scenario Library — Six real-world stress tests across finance, triage, and security domains demonstrate adaptive autonomy.
How we built it
Frontend — Next.js 15, Tailwind CSS 4, Framer Motion, and ReactFlow power a cinematic glassmorphic observability dashboard. Agent Engine — Gemini 2.0 Flash drives ReAct reasoning and reflection loops. Observability Layer — A custom SSE-streamed Thinking Terminal and dynamic reasoning graph visualize execution traces. Sponsor Integrations — Airia Enterprise for performance evaluation, Modulate ToxMod for safety auditing, and Lightdash for BI-level telemetry dashboards.
Challenges we ran into
Designing a stable self-correction loop without infinite reflection cycles was the primary challenge. We solved this by introducing a dual-memory architecture: Working Memory for current reasoning and Persisted Reflection History for validated improvements. This ensures only score-improving strategies are retained.
Accomplishments that we're proud of
• Achieved a stable autonomous loop that consistently improves evaluation scores across runs • Unified four sponsor technologies into a cohesive developer-focused experience • Built a cinematic Thinking Terminal that makes agent reasoning transparent and auditable
What we learned
Meta-prompting — asking an LLM to evaluate its own reasoning — dramatically improves robustness over single-pass prompting. We also learned that safety auditing via Modulate is essential to detect early-stage agentic drift before it impacts downstream reasoning quality.
What's next for ReflexionOS: The Self-Improving Agentic Operating System
Next steps include Multi-Agent Swarm Reflection, enabling collaborative self-improvement across specialized agents, and deeper Lightdash integration for fleet-scale telemetry and longitudinal agent performance analytics.
Built With
- airia-enterprise
- framer-motion
- gemini-api
- lightdash
- lucide-react
- modulate
- next.js
- react
- reactflow
- shadcn-ui
- tailwindcss
- typescript
- vercel
- zustand
Log in or sign up for Devpost to join the conversation.