Inspiration
Let's be real: Siri doesn't get it. I hate that my "assistant" is blind. It can't see my screen, and it definitely can't click buttons for me. I wanted to fix that. I built Lavis because I wanted an AI that doesn't just chat, but actually drives the computer—seeing what I see, clicking what I click.
What it does
Lavis is a digital human living on your Mac. It breaks out of the chatbox. Powered by Gemini 2.0, it watches your screen in real-time and uses your mouse and keyboard to get things done. No APIs, no special integrations. You tell it: "Send a WhatApp Message to Mom" or "Play some Jazz on Spotify," and it just moves the mouse and does it. It interacts with pixels, not code.
How we built it
We built this beast entirely in Java 21 & Spring Boot. Native screen capture that grabs pixels instantly. Gemini 3.0 flash analyzes the UI and plans the steps.
Accomplishments that we're proud of
It feels alive. Watching the mouse move in a smooth, human curve instead of a robotic jump is satisfying. We proved you don't need complex APIs to control a computer; you just need a smart vision model and a good pair of virtual hands.
What's next for Lavis
Speed and Memory. We're makin the reaction times more fast. Soon, you'll be able to talk to it in real-time while it works, just like a pair programmer.
Log in or sign up for Devpost to join the conversation.