Inspiration

Devin is really annoying to use, but imagine you could just deploy full-stack apps with just your voice - pretty much J.A.R.V.I.S for software engineering.

What it does

Generates apps based on voice input and even uses computer use agents to mimic UI designs.

How we built it

Scrapybara for computer use to screenshot similar designs, Gemini's VLM to take the screenshot and describe it in natural language, ElevenLabs / Whisper for voice communication, and o1 for all of our software agents.

Challenges we ran into

Figuring out how the APIs worked and stringing everything together. Also a lot of latency issues.

Accomplishments that we're proud of

We somehow managed to get something working in this short timeframe.

What we learned

You just gotta have fun with it tbh.

What's next for D.E.V.I.S

A lot more integrations and getting a fully working web app. This could be a decent startup idea too, especially if it controls an army of software agents.

Built With

  • computer-use
  • elevenlabs
  • gemini
  • o1
  • scrapybara
  • vlms
Share this project:

Updates