Inspiration

LLM is very good at understanding and explaining codes. For some cases, it is more comfortable to have the explanation to be delivered as a natural human speech. So it is a good idea to have Gemini to understand and to generate explanation of codes, and then use ElevenLabs to read the explanation using a natural human voice.

What it does

The project provide a concise UI so we can paste codes if we want to learn in deep. Then the application calls Gemini to provide good prompt to ask Gemini to generate a concise and easy-understand explanation. Then we call ElevenLabs textToSpeech API to convert the explanation to audio stream and play to the user.

How we built it

For now, we just use Frontend to do all the work. We build a React application, and integrate with Gemini and ElevenLabs.

Challenges we ran into

The initial version works well. But we need enhance the application if we want to have a really attractive one. For example, we can provide interactive function to allow user to discuss the codes with LLM in multiple rounds; And we noticed that ElevenLabs Agent also has intelligence, so what's the best solution to integrate Gemini and ElevenLabs is a good topic to study further.

Accomplishments that we're proud of

Gemini and ElevenLabs works well in the current release. Gemini can generate clear and easy to understand codes explanation, and ElevenLabs can speak the explanation using a natural voice. This application should be valuable for some users.

What we learned

We learned how to integrate React, Gemini, and ElevenLabs.

What's next for CodeNarator

There are lots of enhancements we can do on this application:

  • Extend the architecture to have backend.
  • Provide interactive function to allow user to discuss codes with AI in multiple rounds of conversation.
  • Study the best solution to integrate Gemini and ElevenLabs.
  • ...

Built With

Share this project:

Updates