Promptly

Inspiration

We set out to develop the ultimate productivity companion. Like how a dog is a man's best friend, Promptly is a user's best friend.

What it does

Promptly is the first AI assistant to be able to view and control your screen, automating any task seamlessly. Powered by zero-shot object detection models, OCR, and Mastra, Promptly is the perfect solution to automating mundane tasks like emails, ordering from Amazon, or even playing Wordle!

How we built it

Promptly uses an Electron frontend framework with a Flask backend. Flask connects with PyAutoGui, transformer hugging face models, image libraries like pytesseract, and OpenAI/Gemini.

We had to carefully prompt the engineer OpenAI/Gemini to respond in parsable text that PyAutoGui could read. Additionally, we had to get OmniParser, a GUI/Icon parsing library, to run and connect with OpenAI. We had found, for maximum performance, the optimal strategy of splitting screens into quadrants to narrow down icon search with OmniParser and text search with Pytesseract, reducing inference time.

After all commands are parsed, we use websockets to communicate back to Electron, giving feedback back to users.

Challenges we ran into

We ran into a ton of challenges, including finding the right huggingface model to use to scrape GUI from, integrating an abort feature with our websockets, and even a bunch of CSS styling issues.

Of these, some really hard challenges included:

The detection of small icons, such as the control center icon or just random GUI icons found across the web.
When using pytesseract to detect text tokens, we ran into a roadblock where coordinates would often be on random scales in relation to the screen height and width.
Compiling OmniParser took forever as it was an outdated huggingface model and required a lot of debugging/fixing.

Accomplishments that we're proud of

We're proud to have trained and implemented a YOLO model to detect small icons and infer their functions. We're also proud of our positive feedback loop that encourages Promptly to complete successive steps to accomplish larger tasks.

What we learned

We learned how to use the Mastra framework to automate tasks like sending emails and scheduling reminders from user instructions.

What's next for Promptly

Future steps include optimizing the models behind Promptly for more efficient execution, and ensuring seamless adaptability across platforms and operating systems.