Jarvis: Just Ask Reactive Visual Intelligence System
What We Built
A voice-controlled desktop that turns simple human speech into complex commands can put you in control of your workspace. Through continuous conversation, users can use just there voice to populate new windows, search the internet for information, organize the layout of their workspace, or take detailed notes.
Technical Architecture
Pipeline: Voice Input → AI Processing → Tool Action → UI Result
- Speech Processing: Continuous Web Speech API with intelligent buffering
- AI Understanding: Cerebras/Qwen models utilized to convert dialogue to actionable commands
- Tool Assortment: Executable actions to create/delete windows, search the internet, and organize the workspace
- Reactionary UI: All of the React components communicate via the central event bus
Key Obstacles Overcame
Continuous Speech
Constructed complex buffering handling in order to deal with natural conversation flow
Natural Language Interpretation
Dual approach using AI as the primary interpreter with cases of fallbacks based on set of predetermined rules for reliability
Window Organization
Space-optimization algorithm that packs many windows efficiently into the screen
Performance
The use of the Web Worker prevents the AI processing from slowing down the main thread
Code Architecture
// Clean separation of concerns
Voice Input → TaskParser → ToolExecutor → Event Bus → WindowManager
(Cerebras) (Local) (Events)
Available tools: open_window, close_window, search, organize_windows, edit_window
Technical Stack
- Frontend: React 19, TypeScript, Tailwind CSS
- AI: Cerebras API (Qwen-3), Gemini (search)
- Architecture: Event-Driven, Web Workers, Tools
- APIs: Web Speech, Canvas, Drag & Drop
The system we built showcases the powerful ability for AI-based interfaces to bring about efficiency when they are designed in a way that turns complexity into simplicity.
Built With
- cerebas
- gemini
- typescript

Log in or sign up for Devpost to join the conversation.