Inspiration We’ve all faced the "2 AM problem"—staring at dense PDFs or documentation, too tired to read but needing to understand. We asked ourselves: Why can't we ask our computer to explain this like a friend, rather than a robot? We wanted to bridge the gap between raw data and auditory learning, making education accessible and hands-free. That desire to turn "information overload" into "listenable stories" was the spark for Echolearn. What it does Echolearn is an AI-powered "Explainer Engine" that transforms complex inputs—PDFs, URLs, or topics—into simple, engaging audio stories. Users simply upload a file or paste a link, and our system generates a conversational explanation using Google Gemini. This text is then instantly synthesized into lifelike human speech via ElevenLabs, allowing users to "listen" to their documents instead of struggling through them. How we built We adopted a serverless architecture using Google Cloud Functions (Node.js) to securely orchestrate the AI pipeline. We utilized the Vertex AI SDK with the gemini-1.5-flash model for high-speed reasoning and ElevenLabs for realistic text-to-speech generation. On the frontend, we integrated PDF.js for local text extraction to ensure user privacy and speed. We optimized our build for the "Efficiency of Learning" metric Challenges we ran into Our biggest hurdle was the "CORS Nightmare" when trying to call AI APIs directly from the browser, which led to immediate security blocks. We solved this by pivoting to a backend-for-frontend architecture, using Cloud Functions as a secure middleware to handle API keys and headers. We also battled audio latency, which we mitigated by switching to the faster Gemini Flash model to ensure the "thinking" time didn't bore the user. Accomplishments that we're proud of We are incredibly proud of building a fully functional, secure Client-Server architecture in just 24 hours. Successfully integrating two powerful AI models (Vertex AI and ElevenLabs) into a seamless pipeline without a single server crash was a huge technical win. We also managed to keep the UI clean and accessible, ensuring that the complex technology behind the scenes feels invisible to the user. What we learned learned that Prompt Engineering is the new UI; simply tweaking the system prompt changed our app from a boring summarizer into an engaging storyteller. We also gained deep experience with Google Cloud Functions, realizing how much faster serverless development is for hackathons compared to managing full servers. Finally, we discovered the importance of robust error handling when dealing with unstructured inputs like scraped website text. What's next for Echo learn We plan to implement real-time audio streaming so users can start listening instantly, rather than waiting for the full generation to complete. We also want to add multi-language support, allowing users to upload English PDFs and hear explanations in Spanish, French, or Swahili. Finally, we aim to build a true "Conversation Mode" where users can interrupt the audio to ask clarifying questions in real-time.

Built With

Share this project:

Updates