Our grandparents often ask us questions about technology that we find trivial, so we thought of a way to automate this process.
What it does
Altruist has screen recording, audio recording, and video recording capabilities on your device. Our custom VAD detects when you're speaking, and after you finish speaking, it sends a recording of whatever capture device you want to the model along with your voice input. Then, it returns an audio output responding to whatever you asked or said!
How we built it
We started the app with the very-good cli, built everything from the foundations provided by their base project.
Challenges we ran into
Originally, we considered using the Gemini Live API which uses a websocket connection. However, there wasn't enough time to learn the websocket system in flutter, so we had to scrap that idea.
Accomplishments that we're proud of
Eventually, we built our stack using Deepgram, Gemini, and ElevenLabs. Deepgram worked as our speech to text so that the model could interpret what we were saying. Gemini served as the actual AI processing our video and message. Finally, ElevenLabs was used to generate the response audio from the text generated by Gemini.
What we learned
We learned how to use flutter and dart for cross platform app development.
What's next for Altruist
Next, we plan to fully implement the Gemini Live API for lower latency communication.
Log in or sign up for Devpost to join the conversation.