TalOS

Inspiration

Every knowledge worker lives in the same loop: open Jira, copy a title, switch to Slack, paste an update, open Gmail, draft a follow-up, open HubSpot, log the deal. Rinse and repeat, dozens of times a day. We asked: what if you could just say what you want done, and it all happens automatically?

That question became TalOS — an AI operating system that turns a single spoken sentence into a fully orchestrated, multi-step enterprise workflow across your tools.

What it does

TalOS lets you speak (or type) natural language commands like:

"Create a P1 Jira ticket for the checkout bug and alert #incidents on Slack"
"Add John Smith to HubSpot and send him an intro email via Gmail"
"Summarize my open tickets and post a standup to #engineering"

TalOS decomposes each request into a dependency-aware task graph, executes independent steps in parallel across 5 enterprise platforms (Jira, Slack, Gmail, HubSpot, Notion), and returns results as both a spoken voice summary and a rich markdown report — all in seconds.

Write actions (sending emails, posting messages) pause for human approval before executing, so you stay in control.

How we built it

TalOS is a TypeScript monorepo (Turborepo) with 20+ packages, built entirely on Amazon Nova:

Nova 2 Pro powers the orchestrator — it reasons over complex multi-step requests and generates dependency-aware task graphs executed by specialist agents
Nova 2 Lite powers the recovery agent — when a task fails, Lite performs fast structured failure diagnosis and stores corrections in semantic memory so the same failure never happens twice
Nova 2 Sonic provides real-time voice control via HTTP/2 bidirectional streaming — the browser sends PCM audio, Sonic transcribes, reasons, invokes tools, and speaks the response back
Nova Multimodal Embeddings power a three-layer semantic memory system using asymmetric embedding (GENERIC_INDEX for storage, GENERIC_RETRIEVAL for queries) for maximum recall on paraphrase searches
Nova Act drives browser-based UI automation with natural language when API access isn't available

The architecture follows an orchestrator-subagent pattern: one orchestrator controls all planning decisions, and four stateless specialist agents (orchestrator, research, execution, recovery) handle the work. A topological task graph engine enables true parallel execution — independent steps run concurrently, dependent steps wait for upstream results.

The dashboard is a Next.js app with real-time agent status visualization, live task progress via SSE streaming, an approval gate UI for write actions, and voice integration via WebSocket.

Challenges we ran into

Nova Sonic's bidirectional streaming protocol required careful event sequencing (sessionStart → promptStart → contentStart → audioInput → ...) — getting tool use working mid-stream was particularly tricky
Nova Act is Python-only, so we built a JSON-over-stdin/stdout bridge to communicate between the TypeScript backend and a Python subprocess driving a real browser
Cross-tool knowledge search needed to fan out across 5 connectors in parallel and merge results by relevance without overwhelming the model's context — we solved this with truncation and source-tagged result objects
Asymmetric embeddings — discovering that Nova Multimodal Embeddings use separate GENERIC_INDEX and GENERIC_RETRIEVAL purposes (and that using them correctly dramatically improves paraphrase recall) took careful reading of the AWS docs

Accomplishments that we're proud of

A single voice command can orchestrate actions across 5 platforms with dependency resolution, parallel execution, automatic recovery, and human-in-the-loop approval — all in seconds
The recovery agent learns from failures: corrections are stored in semantic memory with freshness decay, so the system gets smarter over time
30+ connector actions across Jira, Slack, Gmail, HubSpot, and Notion, all with retry logic and Nova Act fallback
Clean CI/CD pipeline — 20+ packages build, test, and lint on every push

What we learned

Amazon Nova's model family is remarkably well-suited for agentic systems — Pro for deep reasoning, Lite for fast structured inference, Sonic for voice, Embeddings for semantic memory, and Act for browser automation. Using the right model for each job (instead of one model for everything) dramatically improves both cost and performance.
The orchestrator-subagent pattern is the right architecture for multi-tool automation — it separates concerns cleanly and makes each component independently testable and retryable.
Asymmetric embedding purposes (INDEX vs RETRIEVAL) are a powerful but underutilized feature of Nova Embeddings.

What's next for TalOS

More connectors — GitHub, Linear, Salesforce, Google Calendar, Google Drive
Workflow learning — automatically detect repeated patterns and offer to save them as reusable workflows
Production deployment — swap InMemory stores for DynamoDB + OpenSearch (interfaces are already identical), deploy on ECS via Terraform configs in /infra
Multi-user — team workspaces with shared workflows, per-user approval policies, and audit logging

Built With

amazon-bedrock
amazon-nova-2-lite
amazon-nova-2-pro
amazon-nova-2-sonic
amazon-nova-act
amazon-nova-multimodal-embeddings
audio
fastify
next.js
node.js
react
turborepo
typescript
web

Updates

Robert Georges jr started this project — Mar 15, 2026 02:44 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.