Inspiration

We've all been there, staring at a screen, trying to figure out how to do something in software that should be simple.

  • Pivot tables in Google Sheets,
  • Automations in Trello,
  • Background removal in Photoshop,
  • Editing a video in premiere pro
  • Basically any app/ software / website task!

The knowledge exists somewhere on YouTube or a help doc, but the gap between reading instructions and actually doing the thing is where most people give up. We built Roger because we think software should be able to teach you how to use it, in real time, right where you're working.

What it does

Roger AI is an intelligent vision overlay that provides real-time, on-screen guidance for software applications. Instead of just telling users what to do, it shows them. By resting directly over the application, it highlights exactly where to click, explains required inputs, and maps out the next steps to complete complex workflows, making any platform instantly intuitive.

How we built it

Roger is a native macOS application that captures what's on screen and feeds it to vision models to understand UI state in real time. The overlay system renders guidance elements, highlight boxes, arrows, tooltips, directly on top of the user's active app. We use a combination of screen capture APIs, vision model inference, and a custom rendering layer to keep the experience fast and non-intrusive.

Challenges we ran into

Getting the overlay to feel native and not janky was the hardest part. Another thing is timing, if guidance lags behind the user's actions even slightly, the whole experience falls apart. We also had to handle edge cases where apps change layouts, resize windows, or render elements unpredictably. Making it work across any app without integrations means we can't rely on accessibility trees or DOM making it pure vision, which is quite powerful and complex.

Accomplishments that we're proud of

We are incredibly proud of the seamless core overlay experience. Successfully mapping AI-driven visual cues accurately onto a live, shifting screen without breaking the native app's usability was a major technical win. Seeing a user complete a complex, multi-step workflow on a new platform purely by following Roger AI's cues validated the entire concept.

What we learned

We learned that placement matters more than intelligence. A perfectly accurate suggestion that appears in the wrong spot on screen is worse than no suggestion at all. People don't read overlay text, they look where the highlight is. And maybe the most surprising thing was that most people don't actually want AI to take over their computer. They want to feel like they figured it out. Roger works because it makes you competent, and it does not replace you.

What's next for Roger AI

Windows support in April. A B2B SDK in May so any software company can embed Roger-style guidance for their own users. After that, MCP integrations to connect guided workflows with agentic tools. We even plan to introduce voice-prompted guidance, allowing users to simply ask the app, "How do I do X?" and having Roger AI immediately draw the path on their screen. We are also planning to roll out an analytics dashboard so product teams can see exactly where their users get stuck the most. Ultimate north star is to become the default UX component used by absolutely everyone.

Built With

Share this project:

Updates

posted an update

We have made the roger-ai available for everyone to try it out. You can simply download and install the .dmg file, and you once you start roger you will see a small icon on top panel. it will take only a few seconds! give it a shot and let us know the feedback :)

Log in or sign up for Devpost to join the conversation.