Inspiration

Many people learn more effectively through listening, and tend to find textbook revision less efficient. However, most audio generators online cost money, making it very difficult to listen to your notes or textbook. One member of our team had this specific issue at A-level, as she used to drive to school, and found that hour of driving in the car a wasted time with revising. However, no podcasts for revising specific textbooks exist, and all the text-to-speech websites have a 10 minute time limit, so this project was designed to help with that problem. Now students can download your audios, listen on the website on your walk to school, your bus ride or car journey. This helps allow them to maximise their time use when they aren't able to sit down and read a textbook.

What it does

A student can simply enter a pdf of their textbook, select a person to narrate to them, and the website does the rest! The pdf is converted into text, then passed through a C++ file for cleaning, before being handed over to Gemini's API. With a strict rule set (it started getting far too creative) a natural podcast-style narrative is produced, which is then passed to ElevenLabs, creating an MP3 file. During this time the user is entertained by the speaker (in the form of art), before the file is returned and presented to the user, who can then listen on the spot or download their podcast! We also added a bit of fun, you can choose who you want to narrate your audios, choosing based on photos and names. They will personally record your audio and return the file. The website also lists all your old audios, so you can revise those too!

How we built it

First we delegated tasks, one on client-side, one on API investigation, and the other on C++ integration. The latter two then took on the server-side coding. A lot of sequence paths were drawn on paper and a lot of to do lists made. We started with the basic website structure:

  • submit a pdf
  • output an audio file At this point you couldn't select a voice, or download your podcast yet. It also was just narrating the pdf, so we decided to introduce Gemini. We then implemented some more features
  • The download button worked
  • The submit button was reconfigured to actually submit the file
  • Gemini API was enlisted to turn the pdf entered into a more natural conversation, under strict supervision
  • The option to choose your speaker was implemented
  • A member of our team drew their impression of each voice, so you can put a face to the audio
  • These photos were added so the user can select from their pictures
  • We added functionality for viewing old audios

Challenges we ran into

  • A very creative Gemini API who decided adding music would be fun
  • GitHub decided to remove random lines when we pushed code
  • One of our team has a laptop that flat out refuses to host websites
  • 402 errors (who knew they existed - its a paywall apparently)

Accomplishments that we're proud of

  • Taking on API challenges despite none of us having ever used them before
  • Managing to successfully use some C++ alongside a website
  • Overcoming the 402 errors and sorting out the free credits on ElevenLabs, plus fixing the Key to reflect this

What we learned

I think the biggest aspect we learned was integrating the use of APIs within a website. None of us have any experience using APIs, so having the chance to use both ElevenLabs and Gemini was very exciting. Using Github as a group was also a challenge, but we got through it with a surprising lack of clashes, and I think we have finally learned what branches are.

What's next for the Podcast Website

Next is some more C++ cleaning which would involve learning to use RegEx in C++ to remove things like page numbers, which is very exciting. Plus the ability to add different types of files. We would really love it to be able to take a textbook chapter as input and make longer audio files efficiently. Alternatively, it could also be nice to make the Gemini integration optional for those that simply want to listen to their full text book as an audio book. Furthermore, giving the option to sample the different voices before creating an audio file may be nice for those that don't want to choose based on the vibes of the images.

Built With

Share this project:

Updates