Jarvis: Just Ask Reactive Visual Intelligence System

What We Built

A voice-controlled desktop that turns simple human speech into complex commands can put you in control of your workspace. Through continuous conversation, users can use just there voice to populate new windows, search the internet for information, organize the layout of their workspace, or take detailed notes.

Technical Architecture

Pipeline: Voice Input → AI Processing → Tool Action → UI Result

  • Speech Processing: Continuous Web Speech API with intelligent buffering
  • AI Understanding: Cerebras/Qwen models utilized to convert dialogue to actionable commands
  • Tool Assortment: Executable actions to create/delete windows, search the internet, and organize the workspace
  • Reactionary UI: All of the React components communicate via the central event bus

Key Obstacles Overcame

Continuous Speech

Constructed complex buffering handling in order to deal with natural conversation flow

Natural Language Interpretation

Dual approach using AI as the primary interpreter with cases of fallbacks based on set of predetermined rules for reliability

Window Organization

Space-optimization algorithm that packs many windows efficiently into the screen

Performance

The use of the Web Worker prevents the AI processing from slowing down the main thread

Code Architecture

// Clean separation of concerns
Voice Input → TaskParser → ToolExecutor → Event Bus → WindowManager
             (Cerebras)    (Local)       (Events)

Available tools: open_window, close_window, search, organize_windows, edit_window

Technical Stack

  • Frontend: React 19, TypeScript, Tailwind CSS
  • AI: Cerebras API (Qwen-3), Gemini (search)
  • Architecture: Event-Driven, Web Workers, Tools
  • APIs: Web Speech, Canvas, Drag & Drop

The system we built showcases the powerful ability for AI-based interfaces to bring about efficiency when they are designed in a way that turns complexity into simplicity.

Built With

Share this project:

Updates