In the age of AI users (consumers) want either one button app OR app that does EVERYTHING*!

Why voice:

  • When you're driving or say running (run outside people! California is beautiful:) you can't use your phone screen.
  • Voice is faster input and less mental overload than typing, show me anyone who can take written notes and actually understand the meeting at the same time.
  • Today people listen to podcasts, but podcasts are nowhere near as intense as interaction with AI can be!

Current voice mode in ChatGPT, Microsoft CoPilot app, Claude, Perplexity, Gemini, Grok (we evaluate them all!) tries to optimize for latency, but we want to optimize for usefulness.

Thinking agents need time to think. Meanwhile while one agent is working the user is free to launch more agents!

You can think voice voiceroot as voice controlled OS kernel agent. It allows user to do almost anything!

Restrictions: security (via Operant), factuality (via Future AGI Evals), AgentKit framework.

MiniMax model is used to produce TTS responses, and via agent it can also use MiniMax for music, lots of fun :)

AWS Bedrock for Anthropic!

LLM generated code runs on AWS Lambda in V8 isolate.

Built With

Share this project:

Updates