About page
Commands total of 112
Files for views
Homepage jarvis

Inspiration

Most AI assistants today are reactive. You open an app, type a question, get an answer, and leave. I wanted to build something different: an AI that behaves more like a personal operating system than a chatbot.

The idea behind JARVIS was inspired by the vision of a persistent assistant that can see, hear, remember, and act. Instead of simply responding to prompts, I wanted an AI capable of understanding context, controlling a computer, automating tasks, monitoring information, and proactively helping its user throughout the day.

What it does

JARVIS is an autonomous AI operating system designed for personal productivity and computer automation.

It can:

Listen for voice commands using wake-word detection
Understand speech and hold natural conversations
Control a Windows PC through mouse and keyboard automation
Analyze screenshots and understand what is happening on the screen
Remember user preferences, goals, and project history
Manage tasks, reminders, workflows, and objectives
Connect with services such as Telegram, Gmail, GitHub, Spotify, and Google Calendar
Monitor financial markets and MetaTrader 5 trading activity
Generate, edit, review, and improve code using AI models
Operate both locally and with cloud AI fallbacks

Unlike traditional assistants, JARVIS is designed to remain active in the background and assist proactively rather than waiting for every instruction.

How I built it

JARVIS was built primarily using JavaScript with Node.js as the core runtime.

The system combines multiple technologies:

Node.js for orchestration and automation
Python for speech recognition and MetaTrader integration
PowerShell for Windows automation
Electron for desktop integration
Ollama for local AI models
Faster-Whisper for speech-to-text
LLaVA for computer vision
Qwen2.5-Coder for coding assistance
Telegram Bot API for remote notifications
Gmail, GitHub, Spotify, and Calendar integrations
REST APIs and WebSockets for dashboard communication

The architecture is modular, allowing new commands, integrations, and AI capabilities to be added easily.

Challenges I ran into

Building a voice-first AI system presented several challenges.

One of the biggest difficulties was creating a reliable audio pipeline. The wake-word detector, speech recognition system, and text-to-speech engine all needed to work together without microphone conflicts or delays.

Another challenge was managing multiple AI models. Different tasks required different capabilities, so a routing system was developed to automatically choose the best model for coding, conversation, reasoning, or vision tasks.

Creating safe automation was also difficult. Since JARVIS can control a computer and execute actions, permission systems and authority levels had to be introduced to prevent unintended behavior.

Performance optimization was another major challenge, especially when running local AI models on consumer hardware.

What I learned

This project taught me a great deal about:

AI agent architecture
Speech processing systems
Large language model integration
Computer vision workflows
Automation and workflow engines
API integration
Desktop software development
System design and modular architecture

Most importantly, I learned how multiple AI technologies can be combined into a single coherent platform capable of performing real-world tasks.

Future Plans

The long-term goal is to evolve JARVIS into a true AI operating system that works across devices and platforms.

Future improvements include:

Cross-platform support
Enhanced security and authentication
Better long-term memory systems
Multi-device synchronization
Plugin marketplace
Mobile companion application
Smarter autonomous agents
Enterprise-grade deployment options

JARVIS represents my vision of a future where AI is not just a chatbot, but an intelligent digital partner capable of helping people achieve more every day.