WIP | Devpost

Just Do It

Inspiration

We wanted to create a personal AI assistant that doesn’t just talk to you—it actually does things for you. Imagine an AI that can automate your daily tasks, like creating a reminder on Google Calendar or writing a quick to-do list. That’s what we aimed for—an AI that takes productivity to the next level.

What it does

Our AI assistant is designed to do more than just hold a conversation. It can perform tasks on your behalf by interacting with your operating system. Need to submit an assignment automatically on D2L? Our assistant has you covered. It can identify and interact with elements on your screen and perform actions just like a human would—without you having to lift a finger.

How we built it

We combined a couple of key tools to bring this idea to life. First, we used pywinauto, which helps us gather information about elements in a window or on the screen (kind of like web scraping, but at the OS level). This library is a wrapper around Microsoft’s UI Automation framework, originally built for accessibility. Then, we used pyautogui to control the mouse and keyboard, sending text and clicking buttons. By combining both element detection and screen capturing, we gave our AI context so it could predict the next best action to complete a task. Whether it’s identifying a button or submitting a file,it learns from what it sees on the screen and interacts accordingly.

Challenges we ran into

• Dealing with token limits in the OpenAI API
• Figuring out how to accurately identify and interact with screen elements
• Ensuring compatibility, since the system currently only works on Windows
• Handling inconsistencies in network speed, which affected performance

Accomplishments we’re proud of

We’re particularly proud of finding a clever alternative to image recognition. By relying on a mix of element detection and screen context, we avoided some of the headaches of traditional visual recognition methods. Also, we managed to get a working version running smoothly from the command line.

What we learned

We learned a lot about Windows automation, and gained insight into how accessibility tools can be leveraged for everyday tasks. We also gained valuable experience working through OS-level automation challenges and found creative workarounds for common limitations.

What’s next for WIP

The project is still in progress, but we’re excited to keep looking into this idea. Next steps include refining cross-platform compatibility and expanding the range of tasks it can handle.