Inspiration
The reason we chose this project was the fact that many Audiobooks currently use one voice for every character or when they have different voices they have to hire voice actors which can become expensive. So seeing the work that Cartesia does with voices inspired us to try and solve this problem by using different voices for each character which speaks in stories.
What it does
What our project does is it takes in input as a pdf and uses the Gemini API to parse through the text as well as selecting unique voices for each character depending on what the character is like. Then the Cartesia API generates audio files using each voice which we splice together. We present the audio file to the user at the end of the process so it can be listened to.
How we built it
We did extensive research into how exactly the Cartesia API works as well as finding out how we could incorporate Gemini into our project as that was the GenAI we wanted to use. We built the backend using python websockets and the frontend was built using react and typescript using next.js as a framework.
Challenges we ran into
We ran into quite a few problems while trying to use the Cartesia API as it was unfamiliar technology and none of us had ever worked with AI voices before. Also we ran into issues splicing the audios together as it was difficult getting the audios to retain the same quality.
Accomplishments that we're proud of
We are proud of learning how to use a new technology in this short time period as none of us were familiar with how it worked. We are also proud about rising to the challenge of finding different libraries to use in order to splice audios together as well as using Gemini effectively to select the correct voices for specific characters.
What's next for Story Sage
Our main short term goal is to deploy Story Sage as a web application. Our long term goals would be to implement smoother audio files, include more file types as input, and get Story Sage on mobile app stores to enable further access.
Built With
- cartesia
- css
- fastapi
- gemini
- genai
- python
- typescript
- websockets



Log in or sign up for Devpost to join the conversation.