🌟 Inspiration

Have you ever been stuck on your laptop slaving away at your keyboard doing mindless tasks that you'd rather not do right now? We have. We dreamed of an AI agent that works like Siri, but is miles better and truly autonomous. Inspired by Tony Stark's JARVIS from Iron Man, we wanted to create a desktop assistant that doesn't just answer questions—it actually takes control and gets things done for you.

🤖 What it does

JARVIS is your "in-laptop" AI companion that transforms how you interact with your computer. Simply say "Jarvis" and give it any command—want to tell Samantha you can't come to her party? Ask JARVIS to craft a polite message and send it through your laptop!

Key Capabilities:

  • 🎤 Voice-Activated Control - Wake word detection with natural speech commands
  • 🖥️ Full Desktop Automation - Takes complete control of your macOS system
  • 🧠 Smart Task Routing - Intelligently chooses between fast custom controller and advanced Agent-S
  • 🔄 Seamless Handoffs - Automatically escalates complex tasks to more powerful reasoning engines

JARVIS doesn't just understand what you want—it actually does it by controlling your mouse, keyboard, and applications just like a human would.

🛠️ How we built it

Architecture Overview:

  1. Wake Word Detection - Picovoice Porcupine listens for "Jarvis" activation
  2. Speech Recognition - Fish Audio API converts voice to text with high accuracy
  3. Smart Decision Engine - Custom routing logic determines task complexity
  4. Dual Execution Paths:
    • Custom Step Agent - Fast controller for simple tasks (2x speed improvement)
    • Agent-S3 - Advanced reasoning for complex multi-step workflows
  5. Desktop Control - PyAutoGUI and custom controllers execute actions

Tech Stack:

  • Frontend: React + TypeScript + Electron (floating UI)
  • Backend: Python + Node.js hybrid architecture
  • AI/ML: Custom Step Agent + Agent-S3 integration
  • APIs: Fish Audio (STT/TTS), Anthropic Claude, Picovoice
  • Platform: macOS (with plans for cross-platform expansion)

The key innovation is our smart model switching mechanism that automatically chooses the optimal execution path based on task complexity analysis.

💪 Challenges we ran into

1. Agent-S Reliability Issues Getting Agent-S to consistently listen to prompts and execute them effectively was our biggest hurdle. We had to implement extensive error handling and fallback mechanisms.

2. Model Switching Logic Designing the decision engine to intelligently route between our custom controller and Agent-S required careful analysis of task complexity patterns and extensive testing.

3. Audio Processing Pipeline Integrating wake word detection, speech transcription, and voice feedback into a seamless user experience while maintaining low latency was technically challenging.

4. Desktop Control Precision Ensuring reliable screen capture, element detection, and action execution across different applications and UI states required robust computer vision and control algorithms.

🏆 Accomplishments that we're proud of

✨ Fully Functional Voice-to-Action Pipeline - We successfully built an end-to-end system that goes from voice command to actual desktop execution

🧠 Smart Model Switching Innovation - Our custom decision engine that chooses between fast and advanced execution paths is a unique contribution to desktop automation

🎯 Seamless User Experience - Created an unobtrusive floating UI with beautiful animations that feels natural and responsive

🎤 Natural Voice Interface - Achieved reliable wake word detection and speech recognition that works in real-world conditions

🚀 What's next for JARVIS AI

🔄 Enhanced Human-in-the-Loop Functionality

  • Interactive confirmation for sensitive actions
  • Real-time feedback and correction mechanisms
  • User preference learning and adaptation

🧠 Advanced Context Learning

  • Long-term memory of user patterns and preferences
  • Proactive task suggestions based on calendar and workflow analysis
  • Ability to execute recurring tasks autonomously while you're away

Built With

  • agent-s
  • fishaudio
  • sound-exchange
Share this project:

Updates