Inspiration
Every knowledge worker lives in the same loop: open Jira, copy a title, switch to Slack, paste an update, open Gmail, draft a follow-up, open HubSpot, log the deal. Rinse and repeat, dozens of times a day. We asked: what if you could just say what you want done, and it all happens automatically?
That question became TalOS — an AI operating system that turns a single spoken sentence into a fully orchestrated, multi-step enterprise workflow across your tools.
What it does
TalOS lets you speak (or type) natural language commands like:
- "Create a P1 Jira ticket for the checkout bug and alert #incidents on Slack"
- "Add John Smith to HubSpot and send him an intro email via Gmail"
- "Summarize my open tickets and post a standup to #engineering"
TalOS decomposes each request into a dependency-aware task graph, executes independent steps in parallel across 5 enterprise platforms (Jira, Slack, Gmail, HubSpot, Notion), and returns results as both a spoken voice summary and a rich markdown report — all in seconds.
Write actions (sending emails, posting messages) pause for human approval before executing, so you stay in control.
How we built it
TalOS is a TypeScript monorepo (Turborepo) with 20+ packages, built entirely on Amazon Nova:
- Nova 2 Pro powers the orchestrator — it reasons over complex multi-step requests and generates dependency-aware task graphs executed by specialist agents
- Nova 2 Lite powers the recovery agent — when a task fails, Lite performs fast structured failure diagnosis and stores corrections in semantic memory so the same failure never happens twice
- Nova 2 Sonic provides real-time voice control via HTTP/2 bidirectional streaming — the browser sends PCM audio, Sonic transcribes, reasons, invokes tools, and speaks the response back
- Nova Multimodal Embeddings power a three-layer semantic memory system using asymmetric embedding (GENERIC_INDEX for storage, GENERIC_RETRIEVAL for queries) for maximum recall on paraphrase searches
- Nova Act drives browser-based UI automation with natural language when API access isn't available
The architecture follows an orchestrator-subagent pattern: one orchestrator controls all planning decisions, and four stateless specialist agents (orchestrator, research, execution, recovery) handle the work. A topological task graph engine enables true parallel execution — independent steps run concurrently, dependent steps wait for upstream results.
The dashboard is a Next.js app with real-time agent status visualization, live task progress via SSE streaming, an approval gate UI for write actions, and voice integration via WebSocket.
Challenges we ran into
- Nova Sonic's bidirectional streaming protocol required careful event sequencing (sessionStart → promptStart → contentStart → audioInput → ...) — getting tool use working mid-stream was particularly tricky
- Nova Act is Python-only, so we built a JSON-over-stdin/stdout bridge to communicate between the TypeScript backend and a Python subprocess driving a real browser
- Cross-tool knowledge search needed to fan out across 5 connectors in parallel and merge results by relevance without overwhelming the model's context — we solved this with truncation and source-tagged result objects
- Asymmetric embeddings — discovering that Nova Multimodal Embeddings use separate GENERIC_INDEX and GENERIC_RETRIEVAL purposes (and that using them correctly dramatically improves paraphrase recall) took careful reading of the AWS docs
Accomplishments that we're proud of
- A single voice command can orchestrate actions across 5 platforms with dependency resolution, parallel execution, automatic recovery, and human-in-the-loop approval — all in seconds
- The recovery agent learns from failures: corrections are stored in semantic memory with freshness decay, so the system gets smarter over time
- 30+ connector actions across Jira, Slack, Gmail, HubSpot, and Notion, all with retry logic and Nova Act fallback
- Clean CI/CD pipeline — 20+ packages build, test, and lint on every push
What we learned
- Amazon Nova's model family is remarkably well-suited for agentic systems — Pro for deep reasoning, Lite for fast structured inference, Sonic for voice, Embeddings for semantic memory, and Act for browser automation. Using the right model for each job (instead of one model for everything) dramatically improves both cost and performance.
- The orchestrator-subagent pattern is the right architecture for multi-tool automation — it separates concerns cleanly and makes each component independently testable and retryable.
- Asymmetric embedding purposes (INDEX vs RETRIEVAL) are a powerful but underutilized feature of Nova Embeddings.
What's next for TalOS
- More connectors — GitHub, Linear, Salesforce, Google Calendar, Google Drive
- Workflow learning — automatically detect repeated patterns and offer to save them as reusable workflows
- Production deployment — swap InMemory stores for DynamoDB + OpenSearch (interfaces are already identical), deploy on ECS via Terraform configs in
/infra - Multi-user — team workspaces with shared workflows, per-user approval policies, and audit logging
Built With
- amazon-bedrock
- amazon-nova-2-lite
- amazon-nova-2-pro
- amazon-nova-2-sonic
- amazon-nova-act
- amazon-nova-multimodal-embeddings
- audio
- fastify
- next.js
- node.js
- react
- turborepo
- typescript
- web
Log in or sign up for Devpost to join the conversation.