AgenTeach

VIDEO DEMO HERE, couldn't upload due to wifi issues: https://docs.google.com/document/d/1jgZlpW468rwLQ5fCjtlQqrQzp8c2zg3cW8bx6mJkYeQ/edit?usp=sharing

🚀 Agen-Teach: Multi-Agent Workflow Learning System 🧠 Inspiration

We wanted to build an AI system that could learn the way humans teach each other — by showing, not scripting. Most automation tools today require rigid prompts or programming. But humans naturally teach through a mix of talking, showing, and doing. We imagined an AI that could watch your screen, listen to your explanation, and learn your workflow — turning that demonstration into a reusable automation skill.

That idea became Agen-Teach — a system that learns by observing you.

💡 What It Does

Agen-Teach learns new skills simply by watching and listening as you demonstrate a task on your computer. It combines voice interaction, screen observation, and tool experimentation to generate reusable AI Skills that can be executed later through 500+ integrations (Slack, Gmail, Notion, Discord, etc.).

Key features:

🎙️ Voice-Guided Teaching – You explain tasks naturally; the system asks clarifying questions.

👀 Real-Time Screen Observation – Watches your actions and understands workflow context.

⚙️ Parallel Multi-Agent Collaboration – Three AI agents (Voice, Listener, Executor) work together in real time.

🧩 Automatic Skill Generation – Produces Claude-compatible Skills with parameters, examples, and test cases.

🔁 Live Feedback Loop – Shows skill construction in real time as you demonstrate.

The result: a workflow you’d normally “teach a person” — now teachable directly to AI.

🏗️ How We Built It

Agen-Teach is powered by a multi-agent architecture running across three coordinated runtimes:

Main Application (Bun/TypeScript)

Hosts the Voice Agent (OpenAI Realtime API) and Listener Agent (action detector).

Uses a WebSocket event bus for synchronized data exchange.

Executor Agent (Python + FastAPI)

Executes up to 20 parallel actions with validation feedback.

Desktop UI (Tauri + React)

Transparent overlay to visualize captured frames, voice logs, and live skill compilation.

We used:

🧠 OpenAI Realtime API for natural voice interaction

🤖 Anthropic Claude 4.5 Sonnet for action execution and reasoning

🔌 Composio for tool integration

⚡ Bun runtime for low-latency event streaming

🪟 Tauri for a lightweight cross-platform desktop app

🧱 Architecture Overview ┌──────────────────────────┐ │ Desktop UI (Tauri) │ │ Voice + Screen Feedback │ └───────────┬──────────────┘ │ WebSocket ┌───────────┴──────────────┐ │ Main App (Bun) │ │ Voice Agent | Listener │ │ Skill Builder | Pub/Sub │ └───────────┬──────────────┘ │ HTTP ┌───────────┴──────────────┐ │ Executor Agent (Python) │ │ Claude + MCP + Composio │ └──────────────────────────┘

😵‍💫 Challenges We Ran Into

Realtime synchronization between audio, video, and actions — getting frames, transcripts, and events aligned was tough.

Voice stream stability — balancing 24kHz audio with OpenAI Realtime WebSockets.

Authentication setup — juggling multiple API keys (OpenAI, Anthropic, Composio) across environments.

Cross-runtime communication — Bun ↔ FastAPI ↔ Tauri integration required deep debugging.

Screen recording permissions on macOS — lots of test cycles.

🏆 Accomplishments We’re Proud Of

Built a working voice-and-vision multi-agent architecture from scratch in under 48 hours.

Achieved real-time teaching feedback — you can literally see the skill being built as you talk.

Seamless integration of Claude and OpenAI models for collaborative reasoning.

Designed a transparent Tauri UI overlay that feels magical — the AI “watching” your workflow.

Created reusable Skill specifications that can run on any MCP-compatible system.

📚 What We Learned

How to orchestrate multiple large models together effectively — streaming data, event memory, and reasoning handoff.

The importance of clear event schemas — even the smallest sync bug between agents breaks the illusion of “learning”.

That teaching AI by demonstration feels far more natural than prompt engineering.

Bun is ridiculously fast for real-time TypeScript systems.

🔮 What’s Next for Agen-Teach

🎧 Voice cloning + multimodal memory to preserve user teaching style

🧠 Skill Library & Marketplace – share and remix learned workflows

🌐 Cloud sync + team training – let multiple users co-teach one AI

💻 Browser extension for observing web workflows

🛠️ Skill validation sandbox – auto-test before deployment

🧩 Built With

OpenAI Realtime API • Claude Sonnet 4.5 • Bun • FastAPI • Composio • Tauri • React • TypeScript • Python

Built With

anthropic
python
typescript

Updates

Anand Ashar started this project — Oct 26, 2025 12:01 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.