Inspiration
Our tool celebrates a classic marketing campaign and shows how AI can innovate in creating tailored advertising material. It was inspired by the nostalgia of the old "I'm a Mac and I'm a PC" ads and the potential of modern generative AI. Our goal was to make this technology available to companies of all kinds so they could create original ad scripts with just a few basic inputs.
What it does
Our program automates the creation of audio advertisements using user-provided details about their products and companies, such as company name, product nature, organization size, and unique selling points. By leveraging Gemini 1.5 Pro for content and sentiment analysis of traditional advertisements, combined with user inputs, we generate customized ad scripts. These scripts are then transformed into audio ads through OpenAI's text-to-speech API, producing engaging and relevant promotional content.
How we built it
We crafted the user interface using tkinter to provide an intuitive entry point for user data, while the backend integrates Python for logic operations. We utilized Google's Gemini 1.5 Pro model for ad content generation and OpenAI's API for converting text scripts into speech. This integration allows the system to seamlessly transform user inputs into professional-quality audio advertisements.
Challenges we ran into
Our project faced several challenges, including the integration of multiple APIs within a single framework and managing API rate limits, which occasionally restricted our ad generation capabilities. We also encountered complexities in training the Gemini model to understand and reinterpret the essence of classic ads, ensuring the ads generated were not only unique but contextually accurate. Additionally, our initial plan to develop a web application faced technical setbacks, particularly with UI elements like persistent text display, leading us to focus on perfecting our Python-based application.
Accomplishments that we're proud of
We successfully integrated generative AI with audio processing to convert simple text inputs into polished advertisements. Overcoming technical challenges related to API integration and model tuning has significantly enhanced our system's efficiency and reliability, which we consider a substantial achievement.
What we learned
This project deepened our understanding of advanced AI technologies, especially in natural language processing and audio synthesis. We discovered creative solutions to handle API limitations effectively, maintaining functionality and user experience. The project also highlighted the power of the Gemini model, particularly its extensive token context window, which could analyze extensive audio content if needed.
What's next for Change me later
Moving forward, we aim to expand our ad generator's capabilities by adding features for real-time ad editing and multilingual support, which will cater to a more diverse user base. We plan to improve our models to accommodate a broader variety of ad styles and explore extending our services to video ads. Additionally, enhancing the speed of audio generation with preloaded analyses and resurrecting our initial vision for a fully functional web app are key objectives.
Log in or sign up for Devpost to join the conversation.