Inspiration
Oftentimes, we find ourselves rubbing our eyes while working late into the night. We are tired, and our eyes are too. But the work still needs to be done. Billions of people regularly deal with technology, and an accessibility hack that we offer, a voice-controlled computer, solves that problem. Not to mention, over 300 million people live with moderate to severe visual impairments, with almost 50 million of those being those who are blind. For us, a voice-controlled computer may be a matter of convenience, but for them, it unlocks a whole new world of possibilities. A monumental upgrade from screen readers and braille devices, MiaAI interacts with users in a conversational way to execute tasks, enabled by our cutting edge technology.
What it does
MiaAI allows users to entirely control their computer using their voice. By simply conversing with a computer in natural language, anyone can perform complex actions with just a few sentences through Mia AI’s deep understanding of language and its breadth and depth of capabilities in interfacing with software on your computer.
How we built it
MiaAI was built as a native app in macOS using Swift and SwiftUI. We utilize several state-of-the-art AI systems, including OpenAI’s Whisper for real-time transcription, GPT-4 with OpenAI’s Assistants for deep language understanding and general knowledge, macOS’s Shortcuts and Intents API for interfacing with the system, and OpenAI’s Neural Text-to-Speech for responding in clear, intelligible natural language.
Challenges we ran into
Many of the challenges we ran into occurred in our last sprint of the hackathon, when we combined all of our individual features between speech-to-text, interfacing with OpenAI’s API, mapping LLM output to automate tasks, and performing those tasks with specific commands on the user’s end. Some of our software, which worked perfectly fine on one system, suddenly failed on another.
Accomplishments that we're proud of
We are extremely proud of having created all of our work from scratch in the short time span. Connecting many different functionalities and APIs to form one coherent and exciting product felt nothing less than a miracle by the time we finished, just a few hours before the deadline.
What we learned
We learned more about how the power of LLMs can significantly boost productivity, and how much more powerful they can be when integrated into existing applications for more specific workflows.
What's next for MiaAI - Your Personal AI Voice Assistant
We hope to be able to fully integrate MiaAI into the macOS system as well as the user’s installed third-party apps with no external intervention. We will utilize controls for third-party and system applications exposed by Siri Shortcuts and App Intents to enable this seamless integration. Because it utilizes the system App Intents and shortcuts frameworks and systems, this can all be done without any direct intervention from Apple and from third-party app developers. This enables a powerful assistant that’s essentially a digital secretary working across all your favorite apps and your computer, helping you accomplish complex tasks with simple, natural, conversational voice commands. And with our already half-developed element OCR and input control systems, the assistant can directly understand and interact with the computer with a mouse and keyboard in the same way a human would, enabling infinite possibilities for accessibility and automation.
Built With
- gpt4
- openai
- swift
- whisper
Log in or sign up for Devpost to join the conversation.