VIDEO DEMO HERE, couldn't upload due to wifi issues: https://docs.google.com/document/d/1jgZlpW468rwLQ5fCjtlQqrQzp8c2zg3cW8bx6mJkYeQ/edit?usp=sharing
๐ Agen-Teach: Multi-Agent Workflow Learning System ๐ง Inspiration
We wanted to build an AI system that could learn the way humans teach each other โ by showing, not scripting. Most automation tools today require rigid prompts or programming. But humans naturally teach through a mix of talking, showing, and doing. We imagined an AI that could watch your screen, listen to your explanation, and learn your workflow โ turning that demonstration into a reusable automation skill.
That idea became Agen-Teach โ a system that learns by observing you.
๐ก What It Does
Agen-Teach learns new skills simply by watching and listening as you demonstrate a task on your computer. It combines voice interaction, screen observation, and tool experimentation to generate reusable AI Skills that can be executed later through 500+ integrations (Slack, Gmail, Notion, Discord, etc.).
Key features:
๐๏ธ Voice-Guided Teaching โ You explain tasks naturally; the system asks clarifying questions.
๐ Real-Time Screen Observation โ Watches your actions and understands workflow context.
โ๏ธ Parallel Multi-Agent Collaboration โ Three AI agents (Voice, Listener, Executor) work together in real time.
๐งฉ Automatic Skill Generation โ Produces Claude-compatible Skills with parameters, examples, and test cases.
๐ Live Feedback Loop โ Shows skill construction in real time as you demonstrate.
The result: a workflow youโd normally โteach a personโ โ now teachable directly to AI.
๐๏ธ How We Built It
Agen-Teach is powered by a multi-agent architecture running across three coordinated runtimes:
Main Application (Bun/TypeScript)
Hosts the Voice Agent (OpenAI Realtime API) and Listener Agent (action detector).
Uses a WebSocket event bus for synchronized data exchange.
Executor Agent (Python + FastAPI)
Powered by Claude 4.5 Sonnet, orchestrating tool calls via MCP and Composio.
Executes up to 20 parallel actions with validation feedback.
Desktop UI (Tauri + React)
Transparent overlay to visualize captured frames, voice logs, and live skill compilation.
We used:
๐ง OpenAI Realtime API for natural voice interaction
๐ค Anthropic Claude 4.5 Sonnet for action execution and reasoning
๐ Composio for tool integration
โก Bun runtime for low-latency event streaming
๐ช Tauri for a lightweight cross-platform desktop app
๐งฑ Architecture Overview โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Desktop UI (Tauri) โ โ Voice + Screen Feedback โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ WebSocket โโโโโโโโโโโโโดโโโโโโโโโโโโโโโ โ Main App (Bun) โ โ Voice Agent | Listener โ โ Skill Builder | Pub/Sub โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ HTTP โโโโโโโโโโโโโดโโโโโโโโโโโโโโโ โ Executor Agent (Python) โ โ Claude + MCP + Composio โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ตโ๐ซ Challenges We Ran Into
Realtime synchronization between audio, video, and actions โ getting frames, transcripts, and events aligned was tough.
Voice stream stability โ balancing 24kHz audio with OpenAI Realtime WebSockets.
Authentication setup โ juggling multiple API keys (OpenAI, Anthropic, Composio) across environments.
Cross-runtime communication โ Bun โ FastAPI โ Tauri integration required deep debugging.
Screen recording permissions on macOS โ lots of test cycles.
๐ Accomplishments Weโre Proud Of
Built a working voice-and-vision multi-agent architecture from scratch in under 48 hours.
Achieved real-time teaching feedback โ you can literally see the skill being built as you talk.
Seamless integration of Claude and OpenAI models for collaborative reasoning.
Designed a transparent Tauri UI overlay that feels magical โ the AI โwatchingโ your workflow.
Created reusable Skill specifications that can run on any MCP-compatible system.
๐ What We Learned
How to orchestrate multiple large models together effectively โ streaming data, event memory, and reasoning handoff.
The importance of clear event schemas โ even the smallest sync bug between agents breaks the illusion of โlearningโ.
That teaching AI by demonstration feels far more natural than prompt engineering.
Bun is ridiculously fast for real-time TypeScript systems.
๐ฎ Whatโs Next for Agen-Teach
๐ง Voice cloning + multimodal memory to preserve user teaching style
๐ง Skill Library & Marketplace โ share and remix learned workflows
๐ Cloud sync + team training โ let multiple users co-teach one AI
๐ป Browser extension for observing web workflows
๐ ๏ธ Skill validation sandbox โ auto-test before deployment
๐งฉ Built With
OpenAI Realtime API โข Claude Sonnet 4.5 โข Bun โข FastAPI โข Composio โข Tauri โข React โข TypeScript โข Python
Built With
- anthropic
- python
- typescript

Log in or sign up for Devpost to join the conversation.