Inspiration
I presented a side project at last week's SF AI Tinkerer's event, and saw a couple days ago that my openai bill was over $400 for the month, all because of usage this past week. It got me thinking how can I offload a lot of these API calls to processes that could happen locally, and to see how viable running a voice to text inference layer in the browser, and if there's enough time, maybe also use gpt-2 and not make any API calls, providing offline functionality.
What it does
Transcribe speech to text, then use that text as context for a conversation with an LLM based agent.
How we built it
Entire thing can be done on the frontend, with React, but I do think a backend server could be practical for various use cases.
Challenges we ran into
Learning how to work with whisper.cpp's web assembly port.
Accomplishments that we're proud of
Feel like I know a lot more about related topics like IndexedDB API, which is how you'd download and store a (non-LLM) model locally.
What we learned
Above.
What's next for Locally Run Agent
Want to integrate this type of functionality/tech stack into the side project that gave me the $400 bill this month, so I pay less next month, and hopefully can sustain user growth!
Log in or sign up for Devpost to join the conversation.