Inspiration

I had a bunch of these documents sitting on my computer that I never had time to review. If I could instead listen to that information while walking or driving, that would be awesome!

What it does

The project uses the provided documents to generate the script using the LLM model and then turns it into an audio podcast using the TTS model.

How we built it

I have used AI Workbench to build it. It relies on the LLM model of choice (such as phi3 or llama3) and the TTS model is hard coded to be suno-ai/bark.

Challenges we ran into

I didn't have a GPU on my computer, so I tried using the Jetson platform instead. Unfortunately, arm64-based systems are not currently supported, so I had to do a lot of orchestration manually.

Accomplishments that we're proud of

I've learned a lot and was able to get it working end-to-end.

What we learned

AI Workbench, Jetson, TTS, LanchChain.

What's next for Cast AI Workbench project

Add support for longer podcasts and dialogues.

Built With

Share this project:

Updates

posted an update

I have updated it based on the feedback provided. The three apps (Gradio, LLM-server and TTS-server) are now part of the same project which simplifies a lot building and running the them. I've also re-based on to the official PyTorch image and added Ollama server to it. Finally, the init.sh script was added to pre-fetch the models. Follow the updated README to test it out! Thanks!

Log in or sign up for Devpost to join the conversation.