Inspiration
Jarvis started from a simple frustration: desktop work is fragmented across too many apps. Rewriting text, summarizing webpages, editing images, checking weather, and checking your calendar usually means constant context switching. We wanted one assistant that can be triggered globally, understands what you are working on, and helps instantly.
What it does
Jarvis is a AI desktop assistant
- Real-time Gradium speech-to-text
- Clipboard + active app/window context routing
- Persistent user memory notes
- Task routing to
text_task,image_edit,image_generate,image_explain,weather_query,webpage_read,background_remove, andcalendar_list - Text workflows: rewrite, explain, direct Q&A, context-aware drafting
- Image workflows: generate, edit, explain, and remove background (Remove.bg)
- Webpage summarization, weather queries, and Google Calendar read-only event listing
- Optional spoken responses through Gradium or ElevenLabs TTS
How we built it
We built Jarvis as an Electron app with a React renderer and a service-based main process.
- Core orchestration in Electron main process
- Focused services for Gemini routing, Gemini image tasks, Gradium STT, ElevenLabs TTS, weather, background removal, calendar auth/service, memory, context capture, and app-tone adaptation
- A task router classifies intent and dispatches the right workflow
- Clipboard-first context engine combines clipboard content with active app/window metadata
- Multi-window UX with settings, floating push-to-talk pill, and response overlay
- OAuth flow for Google Calendar read-only access
Challenges we ran into
The largest challenge we faced was with speed, we found that a lot of features that we experimented with or added were too slow and we either had to scrap the feature or find ways to speed it up. Another challenge we faced was with gradium and the fact that there was no Typescript SDK for it. We also tried implementing websockets for the first time and found it challenging in the beginning but made it work in the end .
Accomplishments that we're proud of
- A global desktop voice workflow that works across apps
- A practical clipboard-first context engine that powers multi-domain task routing
- Unified text, image, web, weather, and calendar features in one assistant
- Working Google Calendar OAuth integration with read-only event retrieval and formatting
What we learned
- Clipboard context is a strong signal for user intent and reduces prompt friction
- Implement real-time audio processing with WebSockets
- Prompt and routing design are central when one assistant handles many task types
What's next for Jarvis
- Improve routing precision and personalization with stronger memory usage
- Expand dictation and writing workflows with deeper app-aware behavior
- Improve response controls for long-form and technical outputs
- Add more integrations while keeping the global, low-friction workflow
- Continue reliability and onboarding improvements for first-time setup
Built With
- electron
- elevenlabs
- gemini
- gradium
- react
- typescript
Log in or sign up for Devpost to join the conversation.