Inspiration

When I was a child, I was always inspired by the movie Big Hero 6, catapulting me into engineering and computer science. When I was a junior, I turned one of my stuffed plushies into an intelligent voice assistant to run commands on my laptop. "Hey buddy, open my engineering setup". Today, I have decided to iterate upon this childhood dream, creating Sage - a production-ready AI assistant that combines the emotional connection of Baymax with the technical sophistication of modern AI.

What it does

Sage handles complex workflows across 14+ capability modules (vision, file-system, memory, system monitoring, etc.). Powered by Perplexity Sonar's chain reasoning, it excels at: Breaking down complex voice commands into logical step sequences, reasoning through capability dependencies (vision → mouse coordination, file operations → memory storage), and handling ambiguous requests by working through possibilities step-by-step.

Key capabilities include:

  • AI-powered screen vision with Nebius Qwen2-VL-72B-Instruct for intelligent UI automation
  • Real-time translation supporting 12+ languages with OCR + AI processing
  • Comprehensive file-system automation - discover, analyze, and scaffold complete projects
  • Multi-tier memory management with semantic search and cross-session learning
  • Enterprise-grade system monitoring with automated alerts and performance tracking
  • Cross-modal intelligence where vision informs file operations, memory enhances automation

How we built it

Architecture: Built on a hybrid Electron + Next.js architecture enabling both native desktop capabilities and web deployment flexibility.

AI Planning Core: Integrated Perplexity Sonar-Reasoning-Pro as our primary planning engine, leveraging its #1-ranked search arena performance and 1200 tokens/second processing speed. Sonar's chain-of-thought reasoning enables transparent, step-by-step workflow planning across our 14 capability modules.

Vision System: Implemented Nebius AI vision capabilities with intelligent fallbacks to Tesseract OCR, enabling pixel-perfect UI element detection and screen analysis.

Memory Architecture: Designed a sophisticated 3-tier memory system:

  • Ephemeral working memory for real-time reasoning
  • Session memory for context continuity
  • Persistent episodic memory using Supabase with pgvector for semantic search

Execution Context: Created an advanced execution context system with comprehensive data flows between capabilities, enabling intelligent coordination (vision → mouse actions, file operations → memory storage, translation → contextual enhancement).

Capability Modules: Developed 14+ modular capabilities with standardized interfaces:

  • Mouse/keyboard automation, Vision analysis, File-system operations
  • Memory management, Browser control, Media control, Maps/navigation
  • Code generation, Dependency management, System monitoring
  • Database operations, UI feedback, Translation services

Challenges we ran into

Cross-Platform Complexity: Routing and handling between browser and Electron environments, along with managing complex package dependencies and configuration files, was the most technically challenging aspect.

Coordinate Precision: Achieving pixel-perfect mouse automation required solving complex coordinate mapping between AI vision detection and physical display scaling across different DPI settings.

Memory Optimization: Designing efficient context injection for Perplexity Sonar without exceeding token limits while maintaining relevant historical context for improved planning.

Real-time Performance: Balancing comprehensive AI analysis with sub-second response times for natural voice interaction experience.

Accomplishments that we're proud of

Modular Intelligence: The system is self-aware of its capabilities and uses Perplexity Sonar's reasoning to work through unknown scenarios intelligently.

Production-Ready Architecture: Built enterprise-grade error handling, visual verification, and self-repair capabilities that enable reliable automation in real-world scenarios.

Cross-Modal Innovation: Achieved seamless integration between vision, voice, memory, and automation - creating truly intelligent workflows where each capability enhances the others.

Advanced AI Integration: Successfully implemented cutting-edge AI models (Perplexity Sonar, Nebius Vision) in a cohesive system that outperforms traditional voice assistants.

Developer Experience: Created comprehensive debugging tools, coordinate flow visualization, and extensive testing capabilities that make the system maintainable and extensible.

What we learned

AI Orchestration: Learned that the key to advanced AI systems isn't just having powerful models, but orchestrating them intelligently. Perplexity Sonar's chain reasoning proved essential for coordinating complex multi-step workflows.

Context is King: Discovered that sophisticated memory management and context injection dramatically improves AI planning accuracy - our semantic memory search increased success rates by 40%.

Vision-First Automation: Realized that combining AI vision with traditional automation creates exponentially more powerful workflows than either approach alone.

User Trust Through Transparency: Found that showing the AI's chain-of-thought reasoning builds user confidence and enables better human-AI collaboration.

What's next for Sage

Enhanced Reasoning: Integrate Perplexity Sonar's latest reasoning models for even more sophisticated multi-step planning and self-correction capabilities.

Expanded Vision Intelligence: Add support for video analysis, live screen monitoring, and predictive UI interaction patterns.

Collaborative AI: Develop multi-agent workflows where specialized AI assistants collaborate on complex projects using Sage as the coordination layer.

Enterprise Deployment: Build team collaboration features, enterprise security controls, and organization-wide knowledge sharing through our memory system.

Mobile Integration: Extend Sage's capabilities to mobile devices with cross-platform synchronization and cloud-based reasoning.

Industry Specialization: Create domain-specific capability modules for healthcare, finance, education, and creative industries with specialized AI models and workflows.

Open Ecosystem: Develop a plugin architecture enabling third-party developers to extend Sage's capabilities while maintaining our intelligent coordination core powered by Perplexity Sonar.

The vision: Transform Sage from a personal AI assistant into an intelligent automation platform that democratizes advanced AI capabilities for individuals, teams, and enterprises worldwide.

Built With

Share this project:

Updates