Jarvis

Architecture Diagram

Inspiration

The inspiration for Jarvis came from a simple desire: I wanted a truly smart companion that could help me solve real-life problems without me having to constantly reach for my device or type on a keyboard. I wanted an interface that felt natural something I could just talk to while working or moving. And of course, being a fan of Iron Man, the name Jarvis was the only choice. I wanted to see if I could bring a piece of that "futuristic AI assistant" into the real world using the power of the Gemini Live API.

What it does

Jarvis is a multimodal AI agent. It translates voice and visual input into immediate digital and physical prototypes:

Multimodal Live Interaction: Full duplex voice conversation with "barge-in" support.
Visual Perception: Explain webcam feeds or shared screens in real-time (The "Visual Teacher" mode).
3D Hardware Prototyping: Build and animate 3D scenes (like cars or entire solar systems) instantly via React Three Fiber.
Software Sandboxing: Generate complete web applications (HTML/CSS/JS) and see them rendered live.
Interactive Navigation: Plot routes and explore locations on a dynamic map.
Conversational Scaffolding: Once you like a software design, Jarvis can actually scaffold the real code (Next.js, Vite, etc.) directly onto your local machine through conversation.

How we built it

Jarvis is built on a modern, high-performance stack:

Framework: Next.js 15 (App Router) for a lightning-fast frontend and serverless backend.
AI Core: Gemini Live API via the @google/genai SDK for real-time multimodal streaming and tool calling.
3D Engine: React Three Fiber and Three.js for hardware rendering.
Mapping: Leaflet for interactive geographical visualization.
Styling: Vanilla CSS and Tailwind for a sleek, premium dark-mode aesthetic.

Challenges we ran into

Building Jarvis was a journey through extreme technical and personal obstacles.

The "Model Loop": It was incredibly difficult to get Jarvis to be perfectly interruptible; naturally, an AI loves to finish its thought, but for a true companion, it needs to stop the moment I speak.
Prototype Logic: Mapping conversational intent to correct 3D geometries and working code generators was hours of trial and error.
Navigation Integration: Handling coordinate mismatches and single-location lookups vs. multi-point routes was a persistent "pain in the ass."
Resource Constraints: Working on a free-tier plan meant hitting quota limits constantly. Beyond the code, I faced real-world challenges: unstable power supply and the high cost of data subscriptions. I had to delete the entire repository and rebuild from the ground up 4 times before finally arriving at this stable, working version.

Accomplishments that we're proud of

Planetary Motion: Seeing Jarvis successfully build the Earth and other planets accurately orbiting the sun purely from a voice command.
Seamless Navigation: The moment it could take me to San Francisco on the map without crashing.
Fluid Conversation: Achieving a state where the AI is truly interruptible, making it feel like a human companion rather than a script.

What we learned

The biggest lesson was persistence. Failure is just a data point. I learned that no matter how many times you fail, the next attempt might just be the one where it all clicks. Luck, it turns out, favors the persistent.

What's next for Jarvis

The current version of Jarvis is a powerful prototype builder. The next step? Turning Jarvis into a fully functional operating system where every interaction from file management to system settings is handled through this multimodal, conversational interface.

Built With

audioworklet
geminiliveapi
google-cloud
google/genaisdk
leaflet.js
lucidereact
next.js
node.js
promisifiedchildprocess
react
reactleaflet
reactthreedrei
reactthreefiber
tailwindcss
three.js
typescript
vanillacss
webaudioapi
websockets

Updates

Ayobami Okediya started this project — Mar 14, 2026 02:09 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.