Bapto AI – Screen Shared ChatGPT Assistant

Inspiration

As someone who enjoys video editing and graphic designing, I often faced trouble navigating complex interfaces in tools like Adobe Premiere Pro and After Effects.
Simple tasks like finding the right keyframe option or effect panel were time-consuming.

To solve this, I used to take screenshots and ask ChatGPT for help — but this process felt repetitive and disconnected.
This sparked the idea to build an AI that could observe the screen in real time and assist contextually, without switching tabs or copying screenshots manually.

That’s how Bapto AI was born — a floating, screen-aware assistant that combines screen-sharing with AI intelligence through both chat and voice.


What I Learned

  • How to build a floating chatbot interface that doesn’t block user workflows
  • Screen capture techniques and performance optimization
  • Integration of multiple LLM APIs (ChatGPT, Gemini, OpenRouter)
  • Real-time screen tracking and mouse interaction
  • Using Roboflow for on-screen object detection
  • Speech synthesis for voice responses
  • Handling asynchronous API responses in parallel
  • Building a pricing page UI for future monetization

How I Built It

Frontend:

  • React (with TypeScript)
  • TailwindCSS for styling
  • Vite for fast builds and development

AI Services:

  • chatgpt.ts, openai.ts, gemini.ts, and openrouter.ts handle different model integrations
  • roboflow.ts for visual recognition and screen understanding
  • mouseTracking.ts and ScreenInteractionManager.tsx manage real-time interaction detection

Voice and Chat:

  • Speech synthesis modules for verbal responses
  • Text-based chatbot with a floating UI component

Other Features:

  • Pricing page (PricingPage.tsx) for subscription integration
  • Secure environment configuration via .env
  • Future-ready authentication and API key management

Challenges I Faced

  • Building a smooth overlay chatbot without interrupting screen interactions
  • Capturing and processing screen content at a usable speed
  • Managing memory and performance with frequent screenshot polling
  • Handling multiple API services without conflict
  • Creating a voice and chat system that responds naturally and quickly
  • Designing for cross-platform compatibility and expansion (e.g., desktop version)

Built With

Share this project:

Updates