Inspiration

Advertising is crucial for growth, but it can be expensive and inaccessible for smaller businesses, non-profits, or community organizations without large marketing teams. We wanted to build a low-cost, practical way for anyone to produce professional, ready-to-air audio ads. We also had a vision to create something tangible and expandable that could serve as a foundation for new creative and technical features in the future.

What it does

Audiomate produces natural-sounding radio and Spotify ads using company information, brand tone, any previous advertising material, and user preferences. It generates a complete audio advertisement by combining a human-like voiceover, background music aligned with the tone of the message, and a polished script generated from a text prompt. In short, Audiomate can take a short company description and output a professional, production-ready audio ad.

How we built it

  • Speech Synthesis: ElevenLabs API (Multilingual_v2) for text-to-speech generation
  • Frontend: TypeScript and CSS
  • Backend: Python for logic and endpoint management
  • Audio Mixing: Librosa, NumPy, and SciPy to combine voice and music tracks
  • Script Generation: Gemini for producing natural, engaging ad scripts
  • Voice Selection Logic: Custom scoring algorithm that selects the optimal ElevenLabs voice based on user preferences such as tone and gender

Challenges we ran into

One of our main challenges was using the ElevenLabs API for the first time and experimenting with different models and settings to achieve realistic audio samples. Integrating the frontend, backend, and core logic into a seamless workflow also required careful collaboration. After researching different LLM models, we chose Google Gemini for script generation due to its ability to support large context windows, which allowed us to capture more brand information to create more holistic scripts. We also spent a lot of time refining the LLM prompts to capture brand tone and produce engaging scripts, as well as implementing the voice selection logic to determine the optimal voice persona based on user preferences such as tone and gender.

Accomplishments & Takeaways

With Audiomate, we were able to generate lifelike audio ads that are comparable in quality to many current Spotify advertisements, which we’re most proud of given that the system only uses raw branding information and previous ad examples. This project was our first experience working with the ElevenLabs API, and it gave us hands-on exposure to audio engineering and mixing using libraries like librosa. We also strengthened our skills in integrating the backend and frontend, setting up endpoints, and creating a smooth, end-to-end workflow. On the AI side, we gained experience with LLMs, including prompt design, model selection, and fine-tuning to generate scripts that match brand tone and messaging. Overall, the project was a great opportunity to explore new open-source tools and create a tangible, creative AI product.

What's next for Audiomate

We're excited by Audiomate's potential to have creative capabilities beyond the advertisement realm. Future directions include generating interpretive cover images for Spotify ads, creating short background videos to accompany the audio, and even incorporating samples of real human voices so companies can feature specific people speaking in their advertisements. These additions would make Audiomate a more versatile platform for producing quality multimedia content.

Built With

Share this project:

Updates