Inspiration
Devin is really annoying to use, but imagine you could just deploy full-stack apps with just your voice - pretty much J.A.R.V.I.S for software engineering.
What it does
Generates apps based on voice input and even uses computer use agents to mimic UI designs.
How we built it
Scrapybara for computer use to screenshot similar designs, Gemini's VLM to take the screenshot and describe it in natural language, ElevenLabs / Whisper for voice communication, and o1 for all of our software agents.
Challenges we ran into
Figuring out how the APIs worked and stringing everything together. Also a lot of latency issues.
Accomplishments that we're proud of
We somehow managed to get something working in this short timeframe.
What we learned
You just gotta have fun with it tbh.
What's next for D.E.V.I.S
A lot more integrations and getting a fully working web app. This could be a decent startup idea too, especially if it controls an army of software agents.
Built With
- computer-use
- elevenlabs
- gemini
- o1
- scrapybara
- vlms
Log in or sign up for Devpost to join the conversation.