Inspiration
We set out to develop the ultimate productivity companion. Like how a dog is a man's best friend, Promptly is a user's best friend.
What it does
Promptly is the first AI assistant to be able to view and control your screen, automating any task seamlessly. Powered by zero-shot object detection models, OCR, and Mastra, Promptly is the perfect solution to automating mundane tasks like emails, ordering from Amazon, or even playing Wordle!
How we built it
Promptly uses an Electron frontend framework with a Flask backend. Flask connects with PyAutoGui, transformer hugging face models, image libraries like pytesseract, and OpenAI/Gemini.
We had to carefully prompt the engineer OpenAI/Gemini to respond in parsable text that PyAutoGui could read. Additionally, we had to get OmniParser, a GUI/Icon parsing library, to run and connect with OpenAI. We had found, for maximum performance, the optimal strategy of splitting screens into quadrants to narrow down icon search with OmniParser and text search with Pytesseract, reducing inference time.
After all commands are parsed, we use websockets to communicate back to Electron, giving feedback back to users.
Challenges we ran into
We ran into a ton of challenges, including finding the right huggingface model to use to scrape GUI from, integrating an abort feature with our websockets, and even a bunch of CSS styling issues.
Of these, some really hard challenges included:
- The detection of small icons, such as the control center icon or just random GUI icons found across the web.
- When using pytesseract to detect text tokens, we ran into a roadblock where coordinates would often be on random scales in relation to the screen height and width.
- Compiling OmniParser took forever as it was an outdated huggingface model and required a lot of debugging/fixing.
Accomplishments that we're proud of
We're proud to have trained and implemented a YOLO model to detect small icons and infer their functions. We're also proud of our positive feedback loop that encourages Promptly to complete successive steps to accomplish larger tasks.
What we learned
We learned how to use the Mastra framework to automate tasks like sending emails and scheduling reminders from user instructions.
What's next for Promptly
Future steps include optimizing the models behind Promptly for more efficient execution, and ensuring seamless adaptability across platforms and operating systems.
Built With
- bash
- electron
- flask
- gemini
- git
- github
- javascript
- mastra
- multithreading
- numpy
- omniparser
- openai
- python
- pytorch
- react
- requests
- transformers
- typescript
- websockets
- yolo

Log in or sign up for Devpost to join the conversation.