Inspiration

I presented a side project at last week's SF AI Tinkerer's event, and saw a couple days ago that my openai bill was over $400 for the month, all because of usage this past week. It got me thinking how can I offload a lot of these API calls to processes that could happen locally, and to see how viable running a voice to text inference layer in the browser, and if there's enough time, maybe also use gpt-2 and not make any API calls, providing offline functionality.

What it does

Transcribe speech to text, then use that text as context for a conversation with an LLM based agent.

How we built it

Entire thing can be done on the frontend, with React, but I do think a backend server could be practical for various use cases.

Challenges we ran into

Learning how to work with whisper.cpp's web assembly port.

Accomplishments that we're proud of

Feel like I know a lot more about related topics like IndexedDB API, which is how you'd download and store a (non-LLM) model locally.

What we learned

Above.

What's next for Locally Run Agent

Want to integrate this type of functionality/tech stack into the side project that gave me the $400 bill this month, so I pay less next month, and hopefully can sustain user growth!

Built With

Share this project:

Updates